Microsoft sees all the big companies creating original video content, and they want to get in on the fun. They have decided to create a new movie studio, but the problem is they don't know anything about creating movies. They have hired you to help them better understand the movie industry. Your team is in charge with doing data analysis that explores what type of films are currently doing the best at the box office. You must then translate those findings into actionable insights that the CEO can use when deciding what type of films they should be creating.
These are the 5 datasests that were used in this project.
|
Movies.csv
|
|
Movie_basics.csv
|
|
Tn.movie_budgets.csv.gz
|
|
Movie_ratings.csv
|
|
Bom.movie_gross.csv.gz
|
The aim of this project is to advise Microsoft in the production of a profitable, well liked movie or movies in the box office. To do so, this analysis focal point will be about Return On Investment, Net Profit Margin, Profitability, Losses and Ticket Sales. The data will be categorized into different system ratings, which signify the type of audience that will potentially be targeted, this exploration will be analyzed through out the nine insights.These are the subjects of the nine insights in this project; Return on Investment(RIO), Net Profit Margin(NPM), Predictive Analysis, Expenses, Movies that made Profit, Movies that had Losses, Top 20 Highest Profitable Movies VS Top 20 Lowest Profitable Movies, The Most Successful Movie In the Drama Genre and Tickets Sold. This project complies with the objective by using data, the data is then extracted and expressed using data visualization concepts throughout the nine insights. This is done in order to explore data, create structures that allow presentations with useful information that provides context, to also compare each genre and system rating to one another. Based on the dataframe there are nine main genres; Action and Adventures, Drama, Comedy, Documentary, Classics, Art House and International, Musical and Performing Arts, Horror, Mystery and Suspense, Animation, Special Interest, Kids and Family and there are also five main system rating groups; System R Rating, System PG-13 Rating, System PG Rating, System G Rating and System NR Rating. In this Project the four genres that are being used are Adventure, Action, Drama, Comedy, these are also the top 4 most used genres in the film industry . These will all have individual data analysis with the same data visualization concepts, approaches and the same nine insights. The system ratings are being compared among one another throughout the analysis to view which one best fit the genres in each genre. There will be various visualizations for each genre describing and portraying the nine insights based on the activity of the data. However the ultimate aim is not just to satisfy the approaches and insights or in order words be biased but to also analyze the data with objective lenses
ROI and NPM overall conculsion: Just because you know the amount of revenue generated every year is not a good enough guide for studios to know how well they are doing. Some movies generate a lot of profit because of the studio size and the abundance of resources, other studios increase in profit but spend too much money to do so. ROI and NPM will give you a better sense of the studios performance through their budget and profit generated.
What is predictive Analytics? Predictive analysis is using data, statistical algorithms and machine learning techniques to recognize the certainty of future outcomes or circumstances based on historical data. The Objective is to go beyond the knowledge of curtain events to provide the best assessment of what will happen in the future. Companies employ predictive analytics to find patterns in data to identify risks and opportunities. In this approach linear regression and classification analysis will be used to make predictions about future outcomes and performances of significant factors about the movies in this project.
What is Linear Regression? Linear regression is useful for finding relationships between two significant continuous variables. One is the predictor or independent variable and the other is a response or dependent variable. Linear regression looks for statistical relationships but not deterministic relationships. Relationship between two variables is said to be deterministic if one variable can be accurately expressed by the other statistical relationship is not accurate in determining relationship between two variables. Linear regression uses linear relationships between the dependent and independent variables to predict future outcomes.
What is Classification Analysis? Classification analysis is a data mining method used to classify unstructured data into structured classes and groups that assist for discovery of hidden information and future planning. Classification analysis can be used to question, make a decision or predict behavior through the use of machine learning. It works by developing a set of training data which contains a set of attributes as well as the likely outcome. The job of the classification algorithm is to discover how that set of attributes reaches its conclusion.
|
1. Hypothesis: If the budget of a movie increase does the Method Used: Linear Regression Type of Visulization: Animated 3D Scatter Plot Variables: |
|
2. Hypothesis: If the movie was released in any particular season does it affect the opening weekend and profit of that movie? Method Used: Classification Analysis Type of Visulization: Animated 3D Scatter Plot Variables: |
3. Hypothesis: Based on the amount of budget used to create the movie, if the movie was released in any particular month will it inccrease the revenue of the movie? Method Used: Classification Analysis Type of Visulization: Animated 3D Scatter Plot Variables: |
4. Hypothesis: Based on the amoount of budget used to create the movie, if the movie was released in any particular sseason will it increase the profit of the movie? Method Used: Classification Analysis Type of Visulization: Animated 3D Scatter Plot Variables: |
5. Hypothesis: Based on the amount of budget used to create the movie, if the movie was released in any particular season in any perticular month within that season, will it increase the opening weekend of the movie? Method Used: Classification Analysis Type of Visulization:Animated 4D Scatter Plot Variables: |
The explanation of all five of the Hypothesises :
"If the budget of a movie increases, does the opening weekend and profit also increase?"
The purpose of this hypothesis is to see if the budget of a movie is a significant linear predictor of the opening weekend and the profit of a movie. In other words is it possible that the higher the budget spent to create a movie the higher the opening weekend and movie, as the opening weekend does indicate the success of a movie.
"If the movie was released in any particular season does it affect the opening weekend and profit of that movie?"
Classification analysis is used in this approach to identify and assign categories to this data set to allow predictive behaviour. The prediction targeted is using season as a significant predictor of opening weekend and profit. The classification model will be used to create categories that will help predict the opening weekend and profit based on the season the movie was released in.
"Based on the amount of budget used to create the movie, if the movie was released in any particular month will it increase the revenue of the movie?"
Classification analysis is used in this approach to identify and assign categories to this data set to allow predictive behaviour.The prediction targeted is using the budget used to produce thee movie as a significant predictor of which month the movie should be released to generate a certain amount of revenue. The classification model will be used to create categories based on the amount of budgeting that was used to get a certain amount of revenue based on the month the movie was released.
"Based on the amoount of budget used to create the movie, if the movie was released in any particular sseason will it increase the profit of the movie?"
Classification analysis is used in this approcah to identify and assign categories to this data set to allow predictive behaviour.The prediction targeeted is using the budget as a significant predictor of which month within a particulklar saseon the movie should be released to geenerate a particular openiing weekend. The classificationmodel will be used to create categories based on the amount of budgeting that was used to get a certain amount of opening weeekend based on the month within a particular season that the movie was released in. Then you can use the prediction of the opening weekend and the intial budgeting used to help in this classification analysis, as variables in the linear regression analysis of x any that predict the profitof movies using the budget and opening weekend.
"Based on the amount of budget used to create the movie, if the movie was released in any particular season will it increase the profit of the movie?"
Classification analysis is used in this approach to identify and assign categories to this data set to allow predictive behaviour.The prediction targeted is using the budget as a significant predictor of which month within a particular season the movie should be released to generate a particular opening weekend. The classification model will be used to create categories based on the amount of budgeting that was used to get a certain amount of opening weekend based on the month within a particular season that the movie was released in. Then you can use the prediction of the opening weekend and the initial budgeting used to help in this classification analysis, as variables in the linear regression analysis of x any that predict the profit of movies using the budget and opening weekend.
|
Top 20 Highest Profitable Movies What makes a movie a blockbuster?These are some factors to creating blockbuster movies 1. Size: A blockbuster sometimes creates new market flow worldwide through having a multi-dimensional impact on the industry and audience. Sales are broken and expectations are more then meet with blockbuster movies 2. Speed: The volume of sales is not just the only characteristics speed of the sales trajectory. The essence of blockbuster is to shatter anything in it's way in such a short period of time. Blockbuster brands address pressing consumer needs so well that often enjoy vertical sales lift off. 3. Scarcity: In the market, stock outs and shortages normally happens when a blockbuster brand is in high demand. When the new i-Phone was in high demand the speedy availability of counterfeits is another indicator of popularity. 4. Sustainability: A blockbuster brand is not a one hit wonder. It is a gift that keeps on giving. Just like the seven 'Harry Potter' books and the five companion movies, also with the addition of DVD and merchandises sales and theme parks etc. The 'Harry Potter' economy is valued at 15 billion dollars 5. Sizzle: A blockbuster movie is not just in high-demand it is magical and addresses an important need in such a exciting and accessible way. Just like the memorable and magical special effects in the 'Star Wars Series'
|
|
inability to be successful in ticket sales in the box office. 1. Budget: It is significant to have the box office to be in proportion to the budget. In order to make profit from a movie it must generally take in a at least 2 dollars at the box office for every dollars spent, to make the movie somewhat profitable. For example 'John Carter (2012)', it had a worldwide gross of 284 million dollars and a domestic box office of 73 million dollars. The cost of the movie was 250 million dollars, the studio made a big loss and the production company made no money. 2. Timing: Timing is very important when releasing a movie, poor timing of a movie release can cause the right audience from going to the theater due to other things occupying their attention, such as major sports events or fun fairs or carnivals or concerts or bad weather. Not releasing a movie at the wrong time especially releasing a movie when a film is competing with the same genre and similar plot at the same time. Also when releasing the movie too soon after a similar film that has already absorbed the audience interest in film's premise. 3. Bad Buzz: Bad buzz created around a movie due to people who have already seen the movie, articles and social media news about production problems, bad ratings and reviews from critics or poor word of the mouth. All of this contribute to the inability of the movie to be successful. |
Script and Screenplay: A good story, the script and the plot is one of the great static to creating a successful movie. It is one of the most significant thing about any great film. If the story of the film is really out of the box and something really gripping then the film will be a blockbuster. If it captures attention and interest then most of the viewers will gave it a A rating. After having a good story and plot comes the screenplay. Screenplay implies how the story is being shown to the audience and in what sequence in such a profound way that the audience is going through a spectacular journey or fantastic experience. Setting the correct order of what to show and when is really important. This can also been seen through trailers and advertisement allowing the audience to relate to the characters in the story and start to come about the plot of the movie
Directors and Cast/Actors: Having a great director can make a movie successful if the director can put thing in motion by making the cast come together to give the movie a quality appearance and feel, that can come together to give the movie a quality appearance and feel, that can contribute to making the movie successful. The cast is also a significant factor if there is chemistry within the cast it can literally make a movie extremally successful, which has been proven historically.
Differentiation and Mass Appeal: This may be a little for-fetched, it can have it's entertainment quality. If the film produces scenes that the audience have seen before, even it it implies the actors step out of their usually comfort zone when it comes to what characters they normally ply, then the movie may just be interesting enough to be good. Over and above all the other factors, mass appeal may be the number one criterion for the success of a movie. If the studio is able to please a large audience, then the movie may be a successful hit. One of the best ways to capture the most amount of audience, through making the topic a little controversial. The controversy may begin discussions and debates within social media platforms or within friends and family. Getting people to talk may incise the public to flock to the theaters to see what are the commotions about.
When it comes down to being successful in the box office, the recipe is pretty simple: small budget + massive ticket sales = huge profit. If it is done correctly this means an enormous ROI and NPM will be waiting for the studios. In this approach ROI is used to distinguish the top 20 profitable movies apart to really see, who among the blockbuster is the most successful. According to The Numbers, the 3 movies have mastered the money making recipe. This became extremally profitable with a strong ROI.
The Top 3 Highest ROI Movies
|
1. Movie: Deep Throat (1972) Budget: 25,000 Return on Investment: 22,528,467 |
|
2. Movie: Facing the Giants (2006) Budget: 100,000 Return on Investment: 38,551,255 |
|
3. Movie: Paranormal Activity (2007) Budget: 450,000 Return on Investment: 89,376,549 |
1. Deep Throat (1972) is a 1972 American pornographic film that was at the forefront of the Golden Age of Porn (1969-1984). The film was written and directed by Gerard Damiano. This movie is of the first pornographic films to feature a plot character development and relatively high production values. 'Deep Throat' got so much mainstream attention with the launch of the "prono chic" trend. 'Deep Throat' ended up earning an ROI of 90,014 percent a number that is still on top spot for the past 50 years.
2. Having movies with a sport theme during the 2000 era often led to a major box office hits. It's a Christian drama with a sports sub-genre that tuned modest budgets to blockbusters in the box office. Facing the Giants (2006) ended up earning with an ROI of 38,451 percent.
3. Paranormal Activity (2007), this movie was directed and written by Oren Peli, this movie was a classic horror film in 2007. The movie is about a young couple that was haunted by a supernatural entity in their house. The movie ended up with an ROI of 19,761 percent.
G Rated Movies |
PG Rated Movies |
PG-13 Rated Movies |
R Rated Movies |
NC-17 Rated Movies |
Before starting any data extracting, liabreis such as Altair, Pandas, Pandas_highcharts, Collections need to be installed to help create great insightful interactive graphs.
This libary 'Pandas' is uesd to read the data from CSV files and compute it into a dataframe.
# importing module
import pandas as pd
Seaborn is a library based on matplotblib that will used for data visualization in this analysis.
import seaborn as sns
The Collections is a built in Python module, in this analysis it will be used to detect repatiton in a list.
# importing module
from collections import Counter
Matplotlib is a libary that will be used in this analysis for creating visualizations.
# importing module
import matplotlib.pyplot as plt
Statistics is a module in python that provides calculations of mathematical statistics. These calculations are simple math problems such as mean, median, mode, variance and standard deviation.
# importing module
import statistics
Plotly.graph_objects is a module that will be used in this analysis for creating graphs and charts.
# importing module
import plotly.graph_objects as go
This library Numpy is a Python libary that will be used in this analysis to create arrays.
# importing module
import numpy as np
The Math libary is a built-in python module that does mathematical calculations.
# importing module
import math
IPython.display is a module that will be used in this analysis for converting graphs and charts made in python to html for displaying visuals.
# importing module
from IPython.display import display_html
The Collections is a built in Python module, in this analysis the 'defaultdict' it will be used to detect duplicates in a list.
# importing module
from collections import defaultdict
The 'scipy.stats import norm' module will be used to visualize the Normal Distribution of the data.
# importing module
from scipy.stats import norm
Dataframe_image is a module that will be used in this analysis to save dataframes as pictures.
# importing module
import dataframe_image as dfi
The 'scipy.stats import kurtosis' module will be used to get the Kurtosis of the distribution of the data.
# importing module
from scipy.stats import kurtosis
The 'scipy.stats import skew' module will be used to get the Skewness of the distribution of the data.
# importing module
from scipy.stats import skew
The 'scipy import stats' module will be used to calulate the Trimmed Mean of the data.
# importing module
from scipy import stats
The OrderedDict is a data type in the collections module, it tracks the order in which items were added.
# importing module
from collections import Counter, OrderedDict
The Sqlite3 libary will help insert and change rows and manage an SQL database file.
# importing module
import sqlite3
# importing module
from sklearn import linear_model
# importing module
from matplotlib.animation import FuncAnimation
# importing module
from mpl_toolkits.mplot3d import Axes3D
# importing module
import statsmodels.formula.api as smf
# importing module
%matplotlib inline
# importing module
from matplotlib import animation
# importing module
from matplotlib import cm
from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import NearestNeighbors
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
The pandas_highcharts.core libary helps create interactive Highcharts graphs and charts.
# importing module
from pandas_highcharts.core import serialize
The pandas_highcharts.core libary helps create interactive Highcharts graphs and charts.
# importing module
from pandas_highcharts.display import display_charts
The pandas_highcharts.core libary helps create interactive Highcharts graphs and charts.
# importing module
import json
The '%store' helps store dataframes, list and any instance so it dose not have to be complied or created again. It can easily be restored back.
%store Drama_DataFrame
%store df_data
%store system_rating_r
%store system_rating_pg
%store system_rating_pg13
%store system_rating_nc17
%store system_rating_g
%store dataframe_RIO_r
%store dataframe_RIO_pg
%store dataframe_RIO_pg13
%store dataframe_RIO_g
%store dataframe_RIO_NC
%store df_cost_r
%store freq_dis
%store cum_rel_freq
%store freq_cum_dis
%store df_roi_r
%store freq_dis_roi
%store freq_cum_dis1
%store cum_rel_freq1
%store df_roi_per_r
%store freq_dis3
%store freq_cum_dis2
%store cum_rel_freq2
%store df1
%store df2
%store df3
%store df4
%store df5
%store df_opening
%store df_month
%store df_season
%store df_4D
%store df_profit_season
Stored 'Drama_DataFrame' (DataFrame) Stored 'df_data' (DataFrame) Stored 'system_rating_r' (DataFrame) Stored 'system_rating_pg' (DataFrame) Stored 'system_rating_pg13' (DataFrame) Stored 'system_rating_nc17' (DataFrame) Stored 'system_rating_g' (DataFrame) Stored 'dataframe_RIO_r' (DataFrame) Stored 'dataframe_RIO_pg' (DataFrame) Stored 'dataframe_RIO_pg13' (DataFrame) Stored 'dataframe_RIO_g' (DataFrame) Stored 'dataframe_RIO_NC' (DataFrame) Stored 'df_cost_r' (DataFrame) Stored 'freq_dis' (DataFrame) Stored 'cum_rel_freq' (DataFrame) Stored 'freq_cum_dis' (DataFrame) Stored 'df_roi_r' (DataFrame) Stored 'freq_dis_roi' (DataFrame) Stored 'freq_cum_dis1' (DataFrame) Stored 'cum_rel_freq1' (DataFrame) Stored 'df_roi_per_r' (DataFrame) Stored 'freq_dis3' (DataFrame) Stored 'freq_cum_dis2' (DataFrame) Stored 'cum_rel_freq2' (DataFrame) Stored 'df1' (DataFrame) Stored 'df2' (DataFrame) Stored 'df3' (DataFrame) Stored 'df4' (DataFrame) Stored 'df5' (DataFrame) Stored 'df_opening' (DataFrame) Stored 'df_month' (DataFrame) Stored 'df_season' (DataFrame) Stored 'df_4D' (DataFrame) Stored 'df_profit_season' (DataFrame)
The '%store -r' retrives the dataframes or any instance that was stored by the %store method.
%store -r Drama_DataFrame
%store -r df_data
%store -r system_rating_r
%store -r system_rating_pg
%store -r system_rating_pg13
%store -r system_rating_nc17
%store -r system_rating_g
%store -r dataframe_RIO_r
%store -r dataframe_RIO_pg
%store -r dataframe_RIO_pg13
%store -r dataframe_RIO_g
%store -r dataframe_RIO_NC
%store -r df_cost_r
%store -r freq_dis
%store -r cum_rel_freq
%store -r freq_cum_dis
%store -r df_roi_r
%store -r freq_dis_roi
%store -r freq_cum_dis1
%store -r cum_rel_freq1
%store -r df_roi_per_r
%store -r freq_dis3
%store -r freq_cum_dis2
%store -r cum_rel_freq2
%store -r df1
%store -r df2
%store -r df3
%store -r df4
%store -r df5
%store -r df_opening
%store -r df_month
%store -r df_season
%store -r df_4D
%store -r df_profit_season
Before the graphs are made the data frame is extracted from csv files. The first dataframe that will be extracted is called movie_df that is extracted from movies.csv file
movie_df = pd.read_csv("movies.csv")
Checking the dataframe and getting the first five rows of the dataframe
movie_df.head()
| movie | rating | genre | year | released | score | votes | director | writer | star | country | budget | gross | company | runtime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | The Shining | R | Drama | 1980 | June 13, 1980 (United States) | 8.4 | 927000.0 | Stanley Kubrick | Stephen King | Jack Nicholson | United Kingdom | 19000000.0 | 46998772.0 | Warner Bros. | 146.0 |
| 1 | The Blue Lagoon | R | Adventure | 1980 | July 2, 1980 (United States) | 5.8 | 65000.0 | Randal Kleiser | Henry De Vere Stacpoole | Brooke Shields | United States | 4500000.0 | 58853106.0 | Columbia Pictures | 104.0 |
| 2 | Star Wars: Episode V - The Empire Strikes Back | PG | Action | 1980 | June 20, 1980 (United States) | 8.7 | 1200000.0 | Irvin Kershner | Leigh Brackett | Mark Hamill | United States | 18000000.0 | 538375067.0 | Lucasfilm | 124.0 |
| 3 | Airplane! | PG | Comedy | 1980 | July 2, 1980 (United States) | 7.7 | 221000.0 | Jim Abrahams | Jim Abrahams | Robert Hays | United States | 3500000.0 | 83453539.0 | Paramount Pictures | 88.0 |
| 4 | Caddyshack | R | Comedy | 1980 | July 25, 1980 (United States) | 7.3 | 108000.0 | Harold Ramis | Brian Doyle-Murray | Chevy Chase | United States | 6000000.0 | 39846344.0 | Orion Pictures | 98.0 |
Using sequel to extract files from the im.db file.
connection = sqlite3.connect("im.db")
Connecting to the file
cursor = connection.cursor()
Extracting mocie_basics file from the im.db file.
movie_basics_df = pd.read_sql("""SELECT * FROM movie_basics""",connection)
Checking the dataframe and getting the first five rows of the dataframe
movie_basics_df.head()
| movie_id | movie | original_title | start_year | runtime_minutes | genres | |
|---|---|---|---|---|---|---|
| 0 | tt0063540 | Sunghursh | Sunghursh | 2013 | 175.0 | Action,Crime,Drama |
| 1 | tt0066787 | One Day Before the Rainy Season | Ashad Ka Ek Din | 2019 | 114.0 | Biography,Drama |
| 2 | tt0069049 | The Other Side of the Wind | The Other Side of the Wind | 2018 | 122.0 | Drama |
| 3 | tt0069204 | Sabse Bada Sukh | Sabse Bada Sukh | 2018 | NaN | Comedy,Drama |
| 4 | tt0100275 | The Wandering Soap Opera | La Telenovela Errante | 2017 | 80.0 | Comedy,Drama,Fantasy |
Extracting mocie_ratings file from the im.db file.
movie_ratings_df = pd.read_sql("""SELECT * FROM movie_ratings""",connection)
Checking the dataframe and getting the first five rows of the dataframe
movie_ratings_df.head()
| movie_id | averagerating | numvotes | |
|---|---|---|---|
| 0 | tt10356526 | 8.3 | 31 |
| 1 | tt10384606 | 8.9 | 559 |
| 2 | tt1042974 | 6.4 | 20 |
| 3 | tt1043726 | 4.2 | 50352 |
| 4 | tt1060240 | 6.5 | 21 |
Extracting data from the bom.movie_gross.csv.gz file and making it into a dataframe called movie_gross_df.
movie_gross_df = pd.read_csv("bom.movie_gross (5).csv.gz")
Checking the dataframe and getting the first five rows of the dataframe
movie_gross_df.head()
| movie | studio | domestic_gross | foreign_gross | year | |
|---|---|---|---|---|---|
| 0 | Toy Story 3 | BV | 415000000.0 | 652000000 | 2010 |
| 1 | Alice in Wonderland (2010) | BV | 334200000.0 | 691300000 | 2010 |
| 2 | Harry Potter and the Deathly Hallows Part 1 | WB | 296000000.0 | 664300000 | 2010 |
| 3 | Inception | WB | 292600000.0 | 535700000 | 2010 |
| 4 | Shrek Forever After | P/DW | 238700000.0 | 513900000 | 2010 |
Extracting data from the tn.movie_budgets.csv.gz file and making it into a dataframe called movie_budgets_df.
movie_budgets_df = pd.read_csv("tn.movie_budgets.csv.gz")
Checking the dataframe and getting the first five rows of the dataframe
movie_budgets_df.head()
| id | release_date | movie | production_budget | domestic_gross | worldwide_gross | |
|---|---|---|---|---|---|---|
| 0 | 1 | Dec 18, 2009 | Avatar | $425,000,000 | $760,507,625 | $2,776,345,279 |
| 1 | 2 | May 20, 2011 | Pirates of the Caribbean: On Stranger Tides | $410,600,000 | $241,063,875 | $1,045,663,875 |
| 2 | 3 | Jun 7, 2019 | Dark Phoenix | $350,000,000 | $42,762,350 | $149,762,350 |
| 3 | 4 | May 1, 2015 | Avengers: Age of Ultron | $330,600,000 | $459,005,868 | $1,403,013,963 |
| 4 | 5 | Dec 15, 2017 | Star Wars Ep. VIII: The Last Jedi | $317,000,000 | $620,181,382 | $1,316,721,747 |
Changing the column 'name' to 'movie' to be able to merge movie_df dataframe with others using the same column
movie_df.columns = movie_df.columns.str.replace('name', 'movie')
Checking the dataframe and getting the first five rows of the dataframe
movie_df.head()
| movie | rating | genre | year | released | score | votes | director | writer | star | country | budget | gross | company | runtime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | The Shining | R | Drama | 1980 | June 13, 1980 (United States) | 8.4 | 927000.0 | Stanley Kubrick | Stephen King | Jack Nicholson | United Kingdom | 19000000.0 | 46998772.0 | Warner Bros. | 146.0 |
| 1 | The Blue Lagoon | R | Adventure | 1980 | July 2, 1980 (United States) | 5.8 | 65000.0 | Randal Kleiser | Henry De Vere Stacpoole | Brooke Shields | United States | 4500000.0 | 58853106.0 | Columbia Pictures | 104.0 |
| 2 | Star Wars: Episode V - The Empire Strikes Back | PG | Action | 1980 | June 20, 1980 (United States) | 8.7 | 1200000.0 | Irvin Kershner | Leigh Brackett | Mark Hamill | United States | 18000000.0 | 538375067.0 | Lucasfilm | 124.0 |
| 3 | Airplane! | PG | Comedy | 1980 | July 2, 1980 (United States) | 7.7 | 221000.0 | Jim Abrahams | Jim Abrahams | Robert Hays | United States | 3500000.0 | 83453539.0 | Paramount Pictures | 88.0 |
| 4 | Caddyshack | R | Comedy | 1980 | July 25, 1980 (United States) | 7.3 | 108000.0 | Harold Ramis | Brian Doyle-Murray | Chevy Chase | United States | 6000000.0 | 39846344.0 | Orion Pictures | 98.0 |
Changing the column 'primary_title' to 'movie' to be able to merge movie_basics_df dataframe with others using the same column
movie_basics_df.columns = movie_basics_df.columns.str.replace('primary_title', 'movie')
Checking the dataframe and getting the first five rows of the dataframe
movie_basics_df.head()
| movie_id | movie | original_title | start_year | runtime_minutes | genres | |
|---|---|---|---|---|---|---|
| 0 | tt0063540 | Sunghursh | Sunghursh | 2013 | 175.0 | Action,Crime,Drama |
| 1 | tt0066787 | One Day Before the Rainy Season | Ashad Ka Ek Din | 2019 | 114.0 | Biography,Drama |
| 2 | tt0069049 | The Other Side of the Wind | The Other Side of the Wind | 2018 | 122.0 | Drama |
| 3 | tt0069204 | Sabse Bada Sukh | Sabse Bada Sukh | 2018 | NaN | Comedy,Drama |
| 4 | tt0100275 | The Wandering Soap Opera | La Telenovela Errante | 2017 | 80.0 | Comedy,Drama,Fantasy |
Changing the column 'title' to 'movie' to be able to merge movie_gross_df dataframe with others using the same column
movie_gross_df.columns = movie_gross_df.columns.str.replace('title', 'movie')
Checking the dataframe and getting the first five rows of the dataframe
movie_gross_df.head()
| movie | studio | domestic_gross | foreign_gross | year | |
|---|---|---|---|---|---|
| 0 | Toy Story 3 | BV | 415000000.0 | 652000000 | 2010 |
| 1 | Alice in Wonderland (2010) | BV | 334200000.0 | 691300000 | 2010 |
| 2 | Harry Potter and the Deathly Hallows Part 1 | WB | 296000000.0 | 664300000 | 2010 |
| 3 | Inception | WB | 292600000.0 | 535700000 | 2010 |
| 4 | Shrek Forever After | P/DW | 238700000.0 | 513900000 | 2010 |
After changing the coulmns that hold the name of the movie in every dataframe, to all have the same name 'movie'. The next code makes sure all the dataframes columns that are named 'movie' have the same movies, to check if the dataframes can be merged using the 'movie' column.
This is the 'common_data' function, this function is created to check if two lists have the same elemnts in common
def common_data(list1, list2):
result = False
# traverse in the 1st list
for x in list1:
# traverse in the 2nd list
for y in list2:
# if one common
if x == y:
result = True
return result
return result
Creating a list of movie names from each dataframe to check commonality , to check if the dataframes can be merged using the 'movie' column.
list1 = []
list2 = []
list3 = []
list4 = []
list5 = []
for i in movie_budgets_df.movie:list1.append(i)
for i in movie_gross_df.movie:list2.append(i)
for i in movie_basics_df.movie:list3.append(i)
for i in movie_budgets_df.movie:list4.append(i)
for i in movie_df.movie:list5.append(i)
Checking the number of elements in the 'list1' list.
len(list1)
5782
Printing the first 20 elements in the 'list1' list.
print(list1[:20])
['Avatar', 'Pirates of the Caribbean: On Stranger Tides', 'Dark Phoenix', 'Avengers: Age of Ultron', 'Star Wars Ep. VIII: The Last Jedi', 'Star Wars Ep. VII: The Force Awakens', 'Avengers: Infinity War', 'Pirates of the Caribbean: At Worldâ\x80\x99s End', 'Justice League', 'Spectre', 'The Dark Knight Rises', 'Solo: A Star Wars Story', 'The Lone Ranger', 'John Carter', 'Tangled', 'Spider-Man 3', 'Captain America: Civil War', 'Batman v Superman: Dawn of Justice', 'The Hobbit: An Unexpected Journey', 'Harry Potter and the Half-Blood Prince']
Checking the number of elements in the 'list2' list.
len(list2)
3387
Printing the first 20 elements in the 'list2' list.
print(list2[:20])
['Toy Story 3', 'Alice in Wonderland (2010)', 'Harry Potter and the Deathly Hallows Part 1', 'Inception', 'Shrek Forever After', 'The Twilight Saga: Eclipse', 'Iron Man 2', 'Tangled', 'Despicable Me', 'How to Train Your Dragon', 'Clash of the Titans (2010)', 'The Chronicles of Narnia: The Voyage of the Dawn Treader', "The King's Speech", 'Tron Legacy', 'The Karate Kid', 'Prince of Persia: The Sands of Time', 'Black Swan', 'Megamind', 'Robin Hood', 'The Last Airbender']
Checking the number of elements in the 'list3' list.
len(list3)
146144
Printing the first 20 elements in the 'list3' list.
print(list3[:20])
['Sunghursh', 'One Day Before the Rainy Season', 'The Other Side of the Wind', 'Sabse Bada Sukh', 'The Wandering Soap Opera', 'A Thin Life', 'Bigfoot', 'Joe Finds Grace', 'O Silêncio', 'Nema aviona za Zagreb', 'Pál Adrienn', 'So Much for Justice!', 'Cooper and Hemingway: The True Gen', 'Children of the Green Dragon', 'T.G.M. - osvoboditel', 'The Tragedy of Man', "How Huang Fei-hong Rescued the Orphan from the Tiger's Den", 'Heaven & Hell', 'The Final Journey', 'Los pájaros se van con la muerte']
Checking the number of elements in the 'list5' list.
len(list5)
7668
Printing the first 20 elements in the 'list5' list.
print(list5[:20])
['The Shining', 'The Blue Lagoon', 'Star Wars: Episode V - The Empire Strikes Back', 'Airplane!', 'Caddyshack', 'Friday the 13th', 'The Blues Brothers', 'Raging Bull', 'Superman II', 'The Long Riders', 'Any Which Way You Can', 'The Gods Must Be Crazy', 'Popeye', 'Ordinary People', 'Dressed to Kill', 'Somewhere in Time', 'Fame', '9 to 5', 'The Fog', 'Stir Crazy']
Checking if the dataframes have some elements in the movie column that are the same.
print(common_data(list5, list3),common_data(list1, list5),common_data(list5, list3),
common_data(list3, list4))
True True True True
This is the 'commonelem_set' function, this shows how many elements within a list that is in another list.
def commonelem_set(z, x):
one = set(z)
two = set(x)
if (one & two):
return ("There are common elements in both lists:", one & two)
else:
return ("There are no common elements")
Movie_budgets_df have 1238 movies in common with the Movie_gross_df dataframe
len(commonelem_set(list1, list2)[1])
1238
Movie_budgets_df have 2312 movies in common with the Movie_basics_df dataframe.
len(commonelem_set(list1, list3)[1])
2312
Movie_budgets_df have 3551 movies in common with the Movie_df dataframe.
len(commonelem_set(list1, list5)[1])
3551
Movie_gross_df have 2605 movies in common with the Movie_basics_df dataframe.
len(commonelem_set(list2, list3)[1])
2605
Merging movie_basics_df dataframe with movie_ratings_df dataframe using the movie_id column to create the movie_rating_basics dataframe.
movie_rating_basics = movie_ratings_df.merge(movie_basics_df,on='movie_id')
Checking the dataframe and getting the first five rows of the dataframe
movie_rating_basics.head()
| movie_id | averagerating | numvotes | movie | original_title | start_year | runtime_minutes | genres | |
|---|---|---|---|---|---|---|---|---|
| 0 | tt10356526 | 8.3 | 31 | Laiye Je Yaarian | Laiye Je Yaarian | 2019 | 117.0 | Romance |
| 1 | tt10384606 | 8.9 | 559 | Borderless | Borderless | 2019 | 87.0 | Documentary |
| 2 | tt1042974 | 6.4 | 20 | Just Inès | Just Inès | 2010 | 90.0 | Drama |
| 3 | tt1043726 | 4.2 | 50352 | The Legend of Hercules | The Legend of Hercules | 2014 | 99.0 | Action,Adventure,Fantasy |
| 4 | tt1060240 | 6.5 | 21 | Até Onde? | Até Onde? | 2011 | 73.0 | Mystery,Thriller |
Merging moviebudgets_df dataframe with movie_gross_df dataframe using the movie column to create the df1 dataframe.
df1 = movie_budgets_df.merge(movie_gross_df,on='movie')
Checking the dataframe and getting the first five rows of the dataframe
df1.head()
| id | release_date | movie | production_budget | domestic_gross_x | worldwide_gross | studio | domestic_gross_y | foreign_gross | year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | May 20, 2011 | Pirates of the Caribbean: On Stranger Tides | $410,600,000 | $241,063,875 | $1,045,663,875 | BV | 241100000.0 | 804600000 | 2011 |
| 1 | 4 | May 1, 2015 | Avengers: Age of Ultron | $330,600,000 | $459,005,868 | $1,403,013,963 | BV | 459000000.0 | 946400000 | 2015 |
| 2 | 7 | Apr 27, 2018 | Avengers: Infinity War | $300,000,000 | $678,815,482 | $2,048,134,200 | BV | 678800000.0 | 1,369.5 | 2018 |
| 3 | 9 | Nov 17, 2017 | Justice League | $300,000,000 | $229,024,295 | $655,945,209 | WB | 229000000.0 | 428900000 | 2017 |
| 4 | 10 | Nov 6, 2015 | Spectre | $300,000,000 | $200,074,175 | $879,620,923 | Sony | 200100000.0 | 680600000 | 2015 |
Merging df1 dataframe with movie_rating_basics dataframe using the movie column to create the df2 dataframe.
df2 = df1.merge(movie_rating_basics,on='movie')
Checking the dataframe and getting the first five rows of the dataframe
df2.head()
| id | release_date | movie | production_budget | domestic_gross_x | worldwide_gross | studio | domestic_gross_y | foreign_gross | year | movie_id | averagerating | numvotes | original_title | start_year | runtime_minutes | genres | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | May 20, 2011 | Pirates of the Caribbean: On Stranger Tides | $410,600,000 | $241,063,875 | $1,045,663,875 | BV | 241100000.0 | 804600000 | 2011 | tt1298650 | 6.6 | 447624 | Pirates of the Caribbean: On Stranger Tides | 2011 | 136.0 | Action,Adventure,Fantasy |
| 1 | 4 | May 1, 2015 | Avengers: Age of Ultron | $330,600,000 | $459,005,868 | $1,403,013,963 | BV | 459000000.0 | 946400000 | 2015 | tt2395427 | 7.3 | 665594 | Avengers: Age of Ultron | 2015 | 141.0 | Action,Adventure,Sci-Fi |
| 2 | 7 | Apr 27, 2018 | Avengers: Infinity War | $300,000,000 | $678,815,482 | $2,048,134,200 | BV | 678800000.0 | 1,369.5 | 2018 | tt4154756 | 8.5 | 670926 | Avengers: Infinity War | 2018 | 149.0 | Action,Adventure,Sci-Fi |
| 3 | 9 | Nov 17, 2017 | Justice League | $300,000,000 | $229,024,295 | $655,945,209 | WB | 229000000.0 | 428900000 | 2017 | tt0974015 | 6.5 | 329135 | Justice League | 2017 | 120.0 | Action,Adventure,Fantasy |
| 4 | 10 | Nov 6, 2015 | Spectre | $300,000,000 | $200,074,175 | $879,620,923 | Sony | 200100000.0 | 680600000 | 2015 | tt2379713 | 6.8 | 352504 | Spectre | 2015 | 148.0 | Action,Adventure,Thriller |
Merging df1 dataframe with movie_df dataframe using the movie column to create the df3 dataframe. This is the last merge.
df3 = df2.merge(movie_df,on='movie')
Checking the dataframe and getting the first five rows of the dataframe
df3.head()
| id | release_date | movie | production_budget | domestic_gross_x | worldwide_gross | studio | domestic_gross_y | foreign_gross | year_x | ... | score | votes | director | writer | star | country | budget | gross | company | runtime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4 | May 1, 2015 | Avengers: Age of Ultron | $330,600,000 | $459,005,868 | $1,403,013,963 | BV | 459000000.0 | 946400000 | 2015 | ... | 7.3 | 777000.0 | Joss Whedon | Joss Whedon | Robert Downey Jr. | United States | 250000000.0 | 1.402810e+09 | Marvel Studios | 141.0 |
| 1 | 7 | Apr 27, 2018 | Avengers: Infinity War | $300,000,000 | $678,815,482 | $2,048,134,200 | BV | 678800000.0 | 1,369.5 | 2018 | ... | 8.4 | 897000.0 | Anthony Russo | Christopher Markus | Robert Downey Jr. | United States | 321000000.0 | 2.048360e+09 | Marvel Studios | 149.0 |
| 2 | 9 | Nov 17, 2017 | Justice League | $300,000,000 | $229,024,295 | $655,945,209 | WB | 229000000.0 | 428900000 | 2017 | ... | 6.1 | 418000.0 | Zack Snyder | Jerry Siegel | Ben Affleck | United States | 300000000.0 | 6.579270e+08 | Warner Bros. | 120.0 |
| 3 | 10 | Nov 6, 2015 | Spectre | $300,000,000 | $200,074,175 | $879,620,923 | Sony | 200100000.0 | 680600000 | 2015 | ... | 6.8 | 393000.0 | Sam Mendes | John Logan | Daniel Craig | United Kingdom | 245000000.0 | 8.806815e+08 | B24 | 148.0 |
| 4 | 11 | Jul 20, 2012 | The Dark Knight Rises | $275,000,000 | $448,139,099 | $1,084,439,099 | WB | 448100000.0 | 636800000 | 2012 | ... | 8.4 | 1600000.0 | Christopher Nolan | Jonathan Nolan | Christian Bale | United Kingdom | 250000000.0 | 1.081143e+09 | Warner Bros. | 164.0 |
5 rows × 31 columns
Information of all the dataframes merged to create df3.
df3.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1262 entries, 0 to 1261 Data columns (total 31 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 1262 non-null int64 1 release_date 1262 non-null object 2 movie 1262 non-null object 3 production_budget 1262 non-null object 4 domestic_gross_x 1262 non-null object 5 worldwide_gross 1262 non-null object 6 studio 1262 non-null object 7 domestic_gross_y 1262 non-null float64 8 foreign_gross 1128 non-null object 9 year_x 1262 non-null int64 10 movie_id 1262 non-null object 11 averagerating 1262 non-null float64 12 numvotes 1262 non-null int64 13 original_title 1262 non-null object 14 start_year 1262 non-null int64 15 runtime_minutes 1237 non-null float64 16 genres 1254 non-null object 17 rating 1262 non-null object 18 genre 1262 non-null object 19 year_y 1262 non-null int64 20 released 1261 non-null object 21 score 1262 non-null float64 22 votes 1262 non-null float64 23 director 1262 non-null object 24 writer 1262 non-null object 25 star 1262 non-null object 26 country 1261 non-null object 27 budget 1183 non-null float64 28 gross 1261 non-null float64 29 company 1261 non-null object 30 runtime 1261 non-null float64 dtypes: float64(8), int64(5), object(18) memory usage: 315.5+ KB
Dropping unwanted coulmns from dataframe df3.
df3 = df3.drop(['id', 'movie_id', 'numvotes', 'original_title', 'start_year', 'genres',
'year_y','released','score','votes','country','gross','runtime_minutes',
'budget','domestic_gross_y'], axis=1)
Dropping unwanted coulmns from dataframe df3.
df3= df3.drop(['year_x'], axis=1)
Checking that the columns were removed.
df3.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1262 entries, 0 to 1261 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 release_date 1262 non-null object 1 movie 1262 non-null object 2 production_budget 1262 non-null object 3 domestic_gross_x 1262 non-null object 4 worldwide_gross 1262 non-null object 5 studio 1262 non-null object 6 foreign_gross 1128 non-null object 7 averagerating 1262 non-null float64 8 rating 1262 non-null object 9 genre 1262 non-null object 10 director 1262 non-null object 11 writer 1262 non-null object 12 star 1262 non-null object 13 company 1261 non-null object 14 runtime 1261 non-null float64 dtypes: float64(2), object(13) memory usage: 157.8+ KB
After merging all the csv files and taking our some columnc then we are going to modify the columns to create the final dataframe that will be usedin this analysis. The coulmns that will be in the Drama Dataframe, which will be the finished dataframe that is the final result of the editing of all the other dataframes.
|
|
Creating the Production_Budget column by turning the production budget coulm from the df3 dataframe and turning it from currency into integer
storage1 = []
storage2 = []
production_budget_x=[]
for i in df3.production_budget:
storage1.append(i.replace('$',''))
for i in storage1:
storage2.append(i.replace(',',''))
for i in storage2:
i = int(i)
production_budget_x.append(i)
The 'storage1' list that was tranfromed from string to integer.
print(storage1[:40])
['330,600,000', '300,000,000', '300,000,000', '300,000,000', '275,000,000', '275,000,000', '275,000,000', '275,000,000', '260,000,000', '250,000,000', '250,000,000', '250,000,000', '250,000,000', '250,000,000', '250,000,000', '230,000,000', '225,000,000', '220,000,000', '220,000,000', '217,000,000', '215,000,000', '210,000,000', '210,000,000', '210,000,000', '210,000,000', '210,000,000', '210,000,000', '210,000,000', '210,000,000', '210,000,000', '99,000,000', '99,000,000', '99,000,000', '99,000,000', '99,000,000', '99,000,000', '99,000,000', '99,000,000', '200,000,000', '200,000,000']
The 'production_budget_x' list , this is the result of the tranfromation of the string being changed to integer.
print(production_budget_x[:40])
[330600000, 300000000, 300000000, 300000000, 275000000, 275000000, 275000000, 275000000, 260000000, 250000000, 250000000, 250000000, 250000000, 250000000, 250000000, 230000000, 225000000, 220000000, 220000000, 217000000, 215000000, 210000000, 210000000, 210000000, 210000000, 210000000, 210000000, 210000000, 210000000, 210000000, 99000000, 99000000, 99000000, 99000000, 99000000, 99000000, 99000000, 99000000, 200000000, 200000000]
Checking the number of elements in the 'production_budget_x' list.
len(production_budget_x)
1262
Creating the Domestic_Gross column by turning the domestic gross column from df3 dataframe into integer
storage1 = []
storage2 = []
domestic_gross_y = []
for i in df3.domestic_gross_x:
storage1.append(i.replace('$',''))
for i in storage1:
storage2.append(i.replace(',',''))
for i in storage2:
i = int(i)
domestic_gross_y.append(i)
The 'storage1' list that was tranfromed from string to integer.
print(storage1[:40])
['459,005,868', '678,815,482', '229,024,295', '200,074,175', '448,139,099', '213,767,512', '89,302,115', '73,058,679', '200,821,936', '408,084,349', '330,360,194', '303,003,568', '258,366,855', '255,119,788', '225,764,765', '172,558,876', '291,045,518', '262,030,663', '65,233,400', '130,168,683', '652,270,625', '245,439,076', '105,487,148', '105,487,148', '105,487,148', '105,487,148', '105,487,148', '105,487,148', '105,487,148', '105,487,148', '30,824,628', '30,824,628', '30,824,628', '30,824,628', '30,824,628', '30,824,628', '30,824,628', '30,824,628', '700,059,566', '608,581,744']
The 'domestic_gross_y' list , this is the result of the tranfromation of the string being changed to integer.
print(domestic_gross_y[:40])
[459005868, 678815482, 229024295, 200074175, 448139099, 213767512, 89302115, 73058679, 200821936, 408084349, 330360194, 303003568, 258366855, 255119788, 225764765, 172558876, 291045518, 262030663, 65233400, 130168683, 652270625, 245439076, 105487148, 105487148, 105487148, 105487148, 105487148, 105487148, 105487148, 105487148, 30824628, 30824628, 30824628, 30824628, 30824628, 30824628, 30824628, 30824628, 700059566, 608581744]
Checking the number of elements in the 'domestic_gross_y' list.
len(domestic_gross_y)
1262
Creating the Worldwide_Gross column by turning the worldwide gross column from df3 dataframe into integer
storage1 = []
storage2 = []
worldwide_gross_x=[]
for i in df3.worldwide_gross:
storage1.append(i.replace('$',''))
for i in storage1:
storage2.append(i.replace(',',''))
for i in storage2:
i = int(i)
worldwide_gross_x.append(i)
The 'storage1' list that was tranfromed from string to integer.
print(storage1[:40])
['1,403,013,963', '2,048,134,200', '655,945,209', '879,620,923', '1,084,439,099', '393,151,347', '260,002,115', '282,778,100', '586,477,240', '1,140,069,413', '867,500,281', '1,017,003,568', '960,366,855', '945,577,621', '1,234,846,267', '788,241,137', '667,999,518', '757,890,267', '313,477,717', '602,893,340', '1,648,854,864', '1,104,039,076', '322,459,006', '322,459,006', '322,459,006', '322,459,006', '322,459,006', '322,459,006', '322,459,006', '322,459,006', '84,747,441', '84,747,441', '84,747,441', '84,747,441', '84,747,441', '84,747,441', '84,747,441', '84,747,441', '1,348,258,224', '1,242,520,711']
The 'worldwide_gross_x' list , this is the result of the tranfromation of the string being changed to integer.
print(worldwide_gross_x[:40])
[1403013963, 2048134200, 655945209, 879620923, 1084439099, 393151347, 260002115, 282778100, 586477240, 1140069413, 867500281, 1017003568, 960366855, 945577621, 1234846267, 788241137, 667999518, 757890267, 313477717, 602893340, 1648854864, 1104039076, 322459006, 322459006, 322459006, 322459006, 322459006, 322459006, 322459006, 322459006, 84747441, 84747441, 84747441, 84747441, 84747441, 84747441, 84747441, 84747441, 1348258224, 1242520711]
Checking the number of elements in the 'worldwide_gross_x' list.
len(worldwide_gross_x)
1262
Creating a function that checks if an element is 'NaN'.
def isNaN(num):
return num != num
isNaN(8)# testing the function
False
Creating the Foreign_Gross_x column by turning the foreign gross column from the df3 dataframe from integer to currency
demo = []
foreign_gross_x = []
for i in df3.foreign_gross:
if isinstance(i, str):
if isNaN(i) == True:demo.append(i)
if isNaN(i) == False:
i = i.replace(",","")
i = int(float(i))
demo.append(i)
else:demo.append(i)
for i in demo:
if math.isnan(i) == True:foreign_gross_x.append(i)
if math.isnan(i) == False:
foreign_gross_x.append("${:,.0f}".format(i))
The 'demo' list that was tranfromed from integer to currency.
print(demo[:40])
[946400000, 1369, 428900000, 680600000, 636800000, 179200000, 171200000, 211100000, 391000000, 745200000, 543300000, 718100000, 700000000, 700900000, 1010, 622300000, 377000000, 495900000, 237600000, 475300000, 1019, 858600000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 646900000, 634200000]
The 'foreign_gross_x' list , this is the result of the tranfromation of the integer being changed to currency.
print(foreign_gross_x[:40])
['$946,400,000', '$1,369', '$428,900,000', '$680,600,000', '$636,800,000', '$179,200,000', '$171,200,000', '$211,100,000', '$391,000,000', '$745,200,000', '$543,300,000', '$718,100,000', '$700,000,000', '$700,900,000', '$1,010', '$622,300,000', '$377,000,000', '$495,900,000', '$237,600,000', '$475,300,000', '$1,019', '$858,600,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$216,400,000', '$646,900,000', '$634,200,000']
Checking the number of elements in the 'foreign_gross_x' list.
len(foreign_gross_x)
1262
Creating the Foreign_Gross column by turning the foreign gross column from df3 dataframe into integer
foreign_gross = []
for i in df3.foreign_gross:
if isinstance(i, str):
i = i.replace(",","")
i = int(float(i))
foreign_gross.append(i)
else:foreign_gross.append(i)
The first 40 elemnts of the 'foreign_gross' list.
print(foreign_gross[:40])
[946400000, 1369, 428900000, 680600000, 636800000, 179200000, 171200000, 211100000, 391000000, 745200000, 543300000, 718100000, 700000000, 700900000, 1010, 622300000, 377000000, 495900000, 237600000, 475300000, 1019, 858600000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 216400000, 646900000, 634200000]
Checking the number of elements in the 'foreign_gross' list.
len(foreign_gross)
1262
After creating the columns, they are then aded to dataframe df3
df3['foreign_gross']=foreign_gross
df3['worldwide_gross_x']=worldwide_gross_x
df3['foreign_gross_x']=foreign_gross_x
df3['production_budget_x']=production_budget_x
df3['domestic_gross_y']=domestic_gross_y
Creating the Profit column by subtracting the world gross column from the production budget coulmn from df3 dataframe to get the profit of the movies which are in integer
profit = []
for x,y in enumerate(df3.worldwide_gross_x):
profit.append(y-df3.production_budget_x[x])
print(profit[:40]) #showing the profit list
[1072413963, 1748134200, 355945209, 579620923, 809439099, 118151347, -14997885, 7778100, 326477240, 890069413, 617500281, 767003568, 710366855, 695577621, 984846267, 558241137, 442999518, 537890267, 93477717, 385893340, 1433854864, 894039076, 112459006, 112459006, 112459006, 112459006, 112459006, 112459006, 112459006, 112459006, -14252559, -14252559, -14252559, -14252559, -14252559, -14252559, -14252559, -14252559, 1148258224, 1042520711]
Checking the number of elements in the 'profit' list.
len(profit)
1262
Creating the Profit_x column by turning the elements in the profit list to a currency
profit_x = []
for i in profit:
profit_x.append("${:,.0f}".format(i))
print(profit_x[:40]) #showing the profit_x list
['$1,072,413,963', '$1,748,134,200', '$355,945,209', '$579,620,923', '$809,439,099', '$118,151,347', '$-14,997,885', '$7,778,100', '$326,477,240', '$890,069,413', '$617,500,281', '$767,003,568', '$710,366,855', '$695,577,621', '$984,846,267', '$558,241,137', '$442,999,518', '$537,890,267', '$93,477,717', '$385,893,340', '$1,433,854,864', '$894,039,076', '$112,459,006', '$112,459,006', '$112,459,006', '$112,459,006', '$112,459,006', '$112,459,006', '$112,459,006', '$112,459,006', '$-14,252,559', '$-14,252,559', '$-14,252,559', '$-14,252,559', '$-14,252,559', '$-14,252,559', '$-14,252,559', '$-14,252,559', '$1,148,258,224', '$1,042,520,711']
Checking the number of elements in the 'profit_x' list.
len(profit_x)
1262
Creating the Tickets column by deviding the worldwide gross from the worldwide gross column in df3 with '10' which is the average ticket price worlwide, to get the number of tickets that were sold from each movie.
no_tickets = []
for i in df3.worldwide_gross_x:
no_tickets.append(round(i/10))
print(no_tickets[:40]) #showing the no_tickets list
[140301396, 204813420, 65594521, 87962092, 108443910, 39315135, 26000212, 28277810, 58647724, 114006941, 86750028, 101700357, 96036686, 94557762, 123484627, 78824114, 66799952, 75789027, 31347772, 60289334, 164885486, 110403908, 32245901, 32245901, 32245901, 32245901, 32245901, 32245901, 32245901, 32245901, 8474744, 8474744, 8474744, 8474744, 8474744, 8474744, 8474744, 8474744, 134825822, 124252071]
Checking the number of elements in the 'no_tickets' list.
len(no_tickets)
1262
Creating the Tickets_x column by turning the elements in the no_tickets list to a string
str_tickets = []
for i in no_tickets:
str_tickets.append("{:,.0f}".format(i))
print(str_tickets[:40]) #showing the str_tickets list
['140,301,396', '204,813,420', '65,594,521', '87,962,092', '108,443,910', '39,315,135', '26,000,212', '28,277,810', '58,647,724', '114,006,941', '86,750,028', '101,700,357', '96,036,686', '94,557,762', '123,484,627', '78,824,114', '66,799,952', '75,789,027', '31,347,772', '60,289,334', '164,885,486', '110,403,908', '32,245,901', '32,245,901', '32,245,901', '32,245,901', '32,245,901', '32,245,901', '32,245,901', '32,245,901', '8,474,744', '8,474,744', '8,474,744', '8,474,744', '8,474,744', '8,474,744', '8,474,744', '8,474,744', '134,825,822', '124,252,071']
Checking the number of elements in the 'str_tickets' list.
len(str_tickets)
1262
After creating more columns, then they are added to dataframe df3.
df3['Profit']=profit
df3['Profit_x']=profit_x
df3['Tickets']=no_tickets
df3['Tickets_x']=str_tickets
Checking the dataframe and getting the first five rows of the dataframe
df3.head()
| release_date | movie | production_budget | domestic_gross_x | worldwide_gross | studio | foreign_gross | averagerating | rating | genre | ... | company | runtime | worldwide_gross_x | foreign_gross_x | production_budget_x | domestic_gross_y | Profit | Profit_x | Tickets | Tickets_x | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | May 1, 2015 | Avengers: Age of Ultron | $330,600,000 | $459,005,868 | $1,403,013,963 | BV | 946400000.0 | 7.3 | PG-13 | Action | ... | Marvel Studios | 141.0 | 1403013963 | $946,400,000 | 330600000 | 459005868 | 1072413963 | $1,072,413,963 | 140301396 | 140,301,396 |
| 1 | Apr 27, 2018 | Avengers: Infinity War | $300,000,000 | $678,815,482 | $2,048,134,200 | BV | 1369.0 | 8.5 | PG-13 | Action | ... | Marvel Studios | 149.0 | 2048134200 | $1,369 | 300000000 | 678815482 | 1748134200 | $1,748,134,200 | 204813420 | 204,813,420 |
| 2 | Nov 17, 2017 | Justice League | $300,000,000 | $229,024,295 | $655,945,209 | WB | 428900000.0 | 6.5 | PG-13 | Action | ... | Warner Bros. | 120.0 | 655945209 | $428,900,000 | 300000000 | 229024295 | 355945209 | $355,945,209 | 65594521 | 65,594,521 |
| 3 | Nov 6, 2015 | Spectre | $300,000,000 | $200,074,175 | $879,620,923 | Sony | 680600000.0 | 6.8 | PG-13 | Action | ... | B24 | 148.0 | 879620923 | $680,600,000 | 300000000 | 200074175 | 579620923 | $579,620,923 | 87962092 | 87,962,092 |
| 4 | Jul 20, 2012 | The Dark Knight Rises | $275,000,000 | $448,139,099 | $1,084,439,099 | WB | 636800000.0 | 8.4 | PG-13 | Action | ... | Warner Bros. | 164.0 | 1084439099 | $636,800,000 | 275000000 | 448139099 | 809439099 | $809,439,099 | 108443910 | 108,443,910 |
5 rows × 23 columns
Making sure the dataframes coullmns are aligned.
df3.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1262 entries, 0 to 1261 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 release_date 1262 non-null object 1 movie 1262 non-null object 2 production_budget 1262 non-null object 3 domestic_gross_x 1262 non-null object 4 worldwide_gross 1262 non-null object 5 studio 1262 non-null object 6 foreign_gross 1128 non-null float64 7 averagerating 1262 non-null float64 8 rating 1262 non-null object 9 genre 1262 non-null object 10 director 1262 non-null object 11 writer 1262 non-null object 12 star 1262 non-null object 13 company 1261 non-null object 14 runtime 1261 non-null float64 15 worldwide_gross_x 1262 non-null int64 16 foreign_gross_x 1128 non-null object 17 production_budget_x 1262 non-null int64 18 domestic_gross_y 1262 non-null int64 19 Profit 1262 non-null int64 20 Profit_x 1262 non-null object 21 Tickets 1262 non-null int64 22 Tickets_x 1262 non-null object dtypes: float64(3), int64(5), object(15) memory usage: 268.9+ KB
Rearranging the columns in dataframe 'df3'.
df3 = df3[['movie','release_date','genre','rating','production_budget_x','production_budget',
'domestic_gross_y','domestic_gross_x','foreign_gross','foreign_gross_x','worldwide_gross',
'worldwide_gross_x','Profit','Profit_x','Tickets','Tickets_x','runtime','averagerating',
'company','studio','star','director','writer']]
Checking the dataframe and getting the first five rows of the dataframe
df3.head()
| movie | release_date | genre | rating | production_budget_x | production_budget | domestic_gross_y | domestic_gross_x | foreign_gross | foreign_gross_x | ... | Profit_x | Tickets | Tickets_x | runtime | averagerating | company | studio | star | director | writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Avengers: Age of Ultron | May 1, 2015 | Action | PG-13 | 330600000 | $330,600,000 | 459005868 | $459,005,868 | 946400000.0 | $946,400,000 | ... | $1,072,413,963 | 140301396 | 140,301,396 | 141.0 | 7.3 | Marvel Studios | BV | Robert Downey Jr. | Joss Whedon | Joss Whedon |
| 1 | Avengers: Infinity War | Apr 27, 2018 | Action | PG-13 | 300000000 | $300,000,000 | 678815482 | $678,815,482 | 1369.0 | $1,369 | ... | $1,748,134,200 | 204813420 | 204,813,420 | 149.0 | 8.5 | Marvel Studios | BV | Robert Downey Jr. | Anthony Russo | Christopher Markus |
| 2 | Justice League | Nov 17, 2017 | Action | PG-13 | 300000000 | $300,000,000 | 229024295 | $229,024,295 | 428900000.0 | $428,900,000 | ... | $355,945,209 | 65594521 | 65,594,521 | 120.0 | 6.5 | Warner Bros. | WB | Ben Affleck | Zack Snyder | Jerry Siegel |
| 3 | Spectre | Nov 6, 2015 | Action | PG-13 | 300000000 | $300,000,000 | 200074175 | $200,074,175 | 680600000.0 | $680,600,000 | ... | $579,620,923 | 87962092 | 87,962,092 | 148.0 | 6.8 | B24 | Sony | Daniel Craig | Sam Mendes | John Logan |
| 4 | The Dark Knight Rises | Jul 20, 2012 | Action | PG-13 | 275000000 | $275,000,000 | 448139099 | $448,139,099 | 636800000.0 | $636,800,000 | ... | $809,439,099 | 108443910 | 108,443,910 | 164.0 | 8.4 | Warner Bros. | WB | Christian Bale | Christopher Nolan | Jonathan Nolan |
5 rows × 23 columns
Renaming the columns in dataframe 'df3'.
df3.columns = ['Movie','Release_Date','Genre','Rating','Production_Budget','Production_Budget_x',
'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross',
'Worldwide_Gross_x','Profit','Profit_x','Tickets','Tickets_x','Runtime','Averagerating',
'Company','Studio','Star','Director','Writer']
Checking the dataframe and getting the first five rows of the dataframe
df3.head()
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | ... | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Studio | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Avengers: Age of Ultron | May 1, 2015 | Action | PG-13 | 330600000 | $330,600,000 | 459005868 | $459,005,868 | 946400000.0 | $946,400,000 | ... | $1,072,413,963 | 140301396 | 140,301,396 | 141.0 | 7.3 | Marvel Studios | BV | Robert Downey Jr. | Joss Whedon | Joss Whedon |
| 1 | Avengers: Infinity War | Apr 27, 2018 | Action | PG-13 | 300000000 | $300,000,000 | 678815482 | $678,815,482 | 1369.0 | $1,369 | ... | $1,748,134,200 | 204813420 | 204,813,420 | 149.0 | 8.5 | Marvel Studios | BV | Robert Downey Jr. | Anthony Russo | Christopher Markus |
| 2 | Justice League | Nov 17, 2017 | Action | PG-13 | 300000000 | $300,000,000 | 229024295 | $229,024,295 | 428900000.0 | $428,900,000 | ... | $355,945,209 | 65594521 | 65,594,521 | 120.0 | 6.5 | Warner Bros. | WB | Ben Affleck | Zack Snyder | Jerry Siegel |
| 3 | Spectre | Nov 6, 2015 | Action | PG-13 | 300000000 | $300,000,000 | 200074175 | $200,074,175 | 680600000.0 | $680,600,000 | ... | $579,620,923 | 87962092 | 87,962,092 | 148.0 | 6.8 | B24 | Sony | Daniel Craig | Sam Mendes | John Logan |
| 4 | The Dark Knight Rises | Jul 20, 2012 | Action | PG-13 | 275000000 | $275,000,000 | 448139099 | $448,139,099 | 636800000.0 | $636,800,000 | ... | $809,439,099 | 108443910 | 108,443,910 | 164.0 | 8.4 | Warner Bros. | WB | Christian Bale | Christopher Nolan | Jonathan Nolan |
5 rows × 23 columns
The movies form the df3 dataframe have been put into genres groups.
# putting movies into groups
grouped = []
for i in df3.Genre:
grouped.append(i)
grouped=Counter(grouped)
grouped
Counter({'Action': 404,
'Animation': 92,
'Adventure': 61,
'Drama': 200,
'Comedy': 243,
'Biography': 118,
'Horror': 63,
'Crime': 66,
'Mystery': 2,
'Romance': 1,
'Fantasy': 8,
'Sci-Fi': 3,
'Thriller': 1})
The movies form the df3 data frame have been put into system-rating groups.
# putting movies into groups
grouped1 = []
for i in df3.Rating:
grouped1.append(i)
grouped1= Counter(grouped1)
grouped1
Counter({'PG-13': 534,
'PG': 161,
'G': 9,
'R': 544,
'Not Rated': 11,
'Unrated': 1,
'NC-17': 2})
Getting the index of all the drama genre movies in the df3 dataframe
drama_index = []
for i,x in enumerate(df3.Genre):
if x == 'Drama':drama_index.append(i)
print(drama_index) #showing the drama_index list
[65, 66, 128, 188, 203, 273, 308, 322, 328, 347, 349, 365, 368, 373, 374, 376, 380, 391, 398, 407, 412, 417, 418, 419, 421, 422, 428, 447, 463, 475, 508, 511, 512, 514, 520, 526, 527, 528, 530, 531, 532, 534, 537, 540, 542, 575, 582, 590, 593, 605, 608, 610, 629, 646, 659, 675, 681, 686, 690, 694, 696, 715, 716, 741, 746, 751, 755, 756, 767, 768, 772, 773, 776, 781, 783, 797, 798, 808, 809, 810, 820, 821, 834, 850, 851, 852, 853, 854, 857, 867, 868, 869, 873, 878, 888, 902, 906, 907, 908, 909, 910, 911, 917, 918, 925, 929, 934, 936, 937, 939, 966, 970, 971, 972, 973, 975, 978, 979, 980, 983, 992, 995, 1005, 1006, 1030, 1031, 1032, 1037, 1038, 1040, 1041, 1050, 1053, 1070, 1072, 1073, 1074, 1079, 1081, 1083, 1084, 1087, 1105, 1106, 1107, 1108, 1121, 1123, 1125, 1130, 1132, 1136, 1138, 1139, 1140, 1142, 1143, 1145, 1146, 1148, 1149, 1151, 1152, 1154, 1155, 1157, 1158, 1162, 1166, 1173, 1177, 1178, 1182, 1187, 1198, 1205, 1207, 1209, 1210, 1211, 1213, 1215, 1216, 1217, 1219, 1220, 1229, 1232, 1233, 1239, 1243, 1244, 1245, 1246, 1254, 1255, 1256, 1257, 1258, 1261]
Checking the number of elements in the 'drama_index' list.
len(drama_index)
200
Pulling the columns using the index that belongs to the Drama genre from df3 dataframe. This is used to create demo_df dataframe.
demo_df = df3.iloc[drama_index]
Resetting the index of demo_df dataframe.
demo_df = demo_df.reset_index(drop=True)
The new dataframe demo_df.
demo_df
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | ... | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Studio | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Hugo | Nov 23, 2011 | Drama | PG | 180000000 | $180,000,000 | 73864507 | $73,864,507 | 111900000.0 | $111,900,000 | ... | $47,784 | 18004778 | 18,004,778 | 126.0 | 7.5 | Paramount Pictures | Par. | Asa Butterfield | Martin Scorsese | John Logan |
| 1 | Hugo | Nov 23, 2011 | Drama | PG | 180000000 | $180,000,000 | 73864507 | $73,864,507 | 111900000.0 | $111,900,000 | ... | $47,784 | 18004778 | 18,004,778 | 126.0 | 7.9 | Paramount Pictures | Par. | Asa Butterfield | Martin Scorsese | John Logan |
| 2 | The Wolfman | Feb 12, 2010 | Drama | R | 150000000 | $150,000,000 | 62189884 | $62,189,884 | 77800000.0 | $77,800,000 | ... | $-7,365,642 | 14263436 | 14,263,436 | NaN | 5.8 | NaN | Uni. | Benicio Del Toro | Joe Johnston | Andrew Kevin Walker |
| 3 | Gravity | Oct 4, 2013 | Drama | PG-13 | 110000000 | $110,000,000 | 274092705 | $274,092,705 | 449100000.0 | $449,100,000 | ... | $583,698,673 | 69369867 | 69,369,867 | 91.0 | 7.7 | Warner Bros. | WB | Sandra Bullock | Alfonso Cuarón | Alfonso Cuarón |
| 4 | Django Unchained | Dec 25, 2012 | Drama | R | 100000000 | $100,000,000 | 162805434 | $162,805,434 | 262600000.0 | $262,600,000 | ... | $349,948,323 | 44994832 | 44,994,832 | 165.0 | 8.4 | The Weinstein Company | Wein. | Jamie Foxx | Quentin Tarantino | Quentin Tarantino |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 195 | Like Crazy | Oct 28, 2011 | Drama | PG-13 | 250000 | $250,000 | 3395391 | $3,395,391 | 336000.0 | $336,000 | ... | $3,478,400 | 372840 | 372,840 | 86.0 | 7.2 | Paramount Vantage | ParV | Felicity Jones | Drake Doremus | Drake Doremus |
| 196 | The Canyons | Aug 2, 2013 | Drama | R | 250000 | $250,000 | 59671 | $59,671 | NaN | NaN | ... | $-187,625 | 6238 | 6,238 | 99.0 | 3.8 | Prettybird | IFC | Lindsay Lohan | Paul Schrader | Bret Easton Ellis |
| 197 | Another Earth | Jul 22, 2011 | Drama | PG-13 | 175000 | $175,000 | 1321194 | $1,321,194 | 456000.0 | $456,000 | ... | $1,927,779 | 210278 | 210,278 | 92.0 | 7.0 | Artists Public Domain | FoxS | Brit Marling | Mike Cahill | Mike Cahill |
| 198 | Sound of My Voice | Apr 27, 2012 | Drama | R | 135000 | $135,000 | 408015 | $408,015 | NaN | NaN | ... | $294,448 | 42945 | 42,945 | 85.0 | 6.6 | Skyscraper Films | FoxS | Christopher Denham | Zal Batmanglij | Zal Batmanglij |
| 199 | A Ghost Story | Jul 7, 2017 | Drama | R | 100000 | $100,000 | 1594798 | $1,594,798 | NaN | NaN | ... | $2,669,782 | 276978 | 276,978 | 92.0 | 6.8 | Sailor Bear | A24 | Casey Affleck | David Lowery | David Lowery |
200 rows × 23 columns
Checking if the dataframes has duplicte rows and deleting the rows thta are duplicated.
Getting all the names of the movies from the demo_df dataframe to detect duplication.
demo_name = []
for i,x in enumerate(demo_df.Movie):demo_name.append(x)
print(demo_name) #showing the demo_name list
['Hugo', 'Hugo', 'The Wolfman', 'Gravity', 'Django Unchained', 'Sing', 'Downsizing', 'Gone Girl', 'Contagion', 'Trouble with the Curve', 'Priest', 'Fifty Shades Darker', 'Fifty Shades Freed', 'Burlesque', 'Burlesque', 'Crimson Peak', 'Zero Dark Thirty', 'Creed II', 'The Post', 'Hereafter', 'Dream House', 'Upside Down', 'Upside Down', 'Upside Down', 'Anna Karenina', 'Anna Karenina', 'Arrival', 'Charlie St. Cloud', 'Fifty Shades of Grey', 'Bridge of Spies', 'The Impossible', 'Paranoia', 'Paranoia', 'Victor Frankenstein', 'Water for Elephants', 'The Master', 'The Master', 'The Master', 'Creed', 'Creed', 'Creed', 'Dolphin Tale', 'The Rite', 'Collateral Beauty', 'True Grit', 'The Tree of Life', 'Biutiful', 'The Longest Ride', 'Step Up Revolution', 'Flight', 'Extraordinary Measures', 'The Vow', 'The Age of Adaline', 'The Space Between Us', 'Safe Haven', 'Anonymous', 'The Best of Me', 'The Help', 'Dear John', 'The Lucky One', 'The Giver', 'Draft Day', 'Rings', 'Tulip Fever', 'Fences', 'The Ides of March', 'Nocturnal Animals', 'The Water Diviner', 'Stone', 'Stone', 'For Colored Girls', 'The Beaver', 'Wonder', 'The Last Song', 'Me Before You', 'The Debt', 'The Debt', 'The Light Between Oceans', 'Let Me In', 'Let Me In', 'By the Sea', 'By the Sea', 'The Book Thief', 'Labor Day', 'Midnight Special', 'Miss Sloane', 'A Quiet Place', 'A Quiet Place', 'Beastly', 'The Roommate', 'Remember Me', 'Remember Me', 'The Homesman', 'The Immigrant', 'The Woman in Black', 'Country Strong', 'One Day', 'One Day', 'One Day', 'One Day', 'One Day', 'One Day', 'Never Let Me Go', 'The Reluctant Fundamentalist', 'Suffragette', 'Black Swan', 'Ex Machina', 'The Perks of Being a Wallflower', 'Room', 'Chloe', 'Project Almanac', 'If Beale Street Could Talk', 'Wish Upon', 'Arbitrage', 'Stoker', 'Carol', 'If I Stay', 'Brooklyn', 'Brooklyn', 'Quartet', 'Hereditary', 'Everything, Everything', 'Mud', 'Mud', 'Coriolanus', 'Coriolanus', 'Amour', 'Melancholia', 'Melancholia', 'Ouija: Origin of Evil', 'Black or White', 'Manchester by the Sea', 'Yeh Jawaani Hai Deewani', 'The Bye Bye Man', 'Gifted', 'Gifted', 'Gifted', 'We Need to Talk About Kevin', 'Hesher', 'Shame', 'Shame', 'The Words', 'Lights Out', 'Lights Out', 'Lights Out', 'Lights Out', 'Still Alice', 'Addicted', 'Before I Fall', 'Everything Must Go', 'Rabbit Hole', 'Mommy', 'Take Shelter', 'Maggie', 'Maggie', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Anna', 'Boyhood', 'Stake Land', 'The Witch', 'Margin Call', 'Whiplash', 'War Room', 'Before Midnight', 'Ida', 'Courageous', 'Silent House', "Winter's Bone", 'The Florida Project', 'We Are Your Friends', 'Locke', 'The Babadook', 'Knock Knock', 'Knock Knock', 'Buried', 'Buried', 'The Lunchbox', 'Unsane', 'Mustang', 'Blue Valentine', 'Martha Marcy May Marlene', 'Palo Alto', 'I Origins', 'The Invitation', 'Like Crazy', 'Like Crazy', 'The Canyons', 'Another Earth', 'Sound of My Voice', 'A Ghost Story']
The function 'list_duplicates' that finds duplicated elements and puts them in a list.
def list_duplicates(seq):
tally = defaultdict(list)
for i,item in enumerate(seq):
tally[item].append(i)
return ((key,locs) for key,locs in tally.items()
if len(locs)>1)
TUsing the 'list_duplicates' to get all the duplications of the names of the movies in the 'demo_name' list.
demo_dup = []
for dup in sorted(list_duplicates(demo_name)):
demo_dup.append(dup)
Showing all the duplicated elemets within the demo_df Drama dataframe
demo_dup
[('A Quiet Place', [86, 87]),
('Anna', [155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166]),
('Anna Karenina', [24, 25]),
('Brooklyn', [117, 118]),
('Buried', [184, 185]),
('Burlesque', [13, 14]),
('By the Sea', [80, 81]),
('Coriolanus', [124, 125]),
('Creed', [38, 39, 40]),
('Gifted', [134, 135, 136]),
('Hugo', [0, 1]),
('Knock Knock', [182, 183]),
('Let Me In', [78, 79]),
('Lights Out', [142, 143, 144, 145]),
('Like Crazy', [194, 195]),
('Maggie', [153, 154]),
('Melancholia', [127, 128]),
('Mud', [122, 123]),
('One Day', [96, 97, 98, 99, 100, 101]),
('Paranoia', [31, 32]),
('Remember Me', [90, 91]),
('Shame', [139, 140]),
('Stone', [68, 69]),
('The Debt', [75, 76]),
('The Master', [35, 36, 37]),
('Upside Down', [21, 22, 23])]
Getting the index of the duplicated elements in the demo_df Drama dataframe
for i in demo_dup:print(i[1][1:])
[87] [156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166] [25] [118] [185] [14] [81] [125] [39, 40] [135, 136] [1] [183] [79] [143, 144, 145] [195] [154] [128] [123] [97, 98, 99, 100, 101] [32] [91] [140] [69] [76] [36, 37] [22, 23]
Putting the duplicated elements index in a list to drop the duplicated elements later on.
demo_dup_index = []
for i in demo_dup:demo_dup_index+=i[1][1:]
print(demo_dup_index) #showing the demo_dup_index list
[87, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 25, 118, 185, 14, 81, 125, 39, 40, 135, 136, 1, 183, 79, 143, 144, 145, 195, 154, 128, 123, 97, 98, 99, 100, 101, 32, 91, 140, 69, 76, 36, 37, 22, 23]
Checking the number of elements in the 'demo_dup_index' list.
len(demo_dup_index)
46
Dropping all the duplicated elements in the demo_df Drama dataframe
demo_df = demo_df.drop(demo_dup_index)
Drama_df = demo_df.reset_index(drop=True)#reseting the index
Putting the movies ratings into groups
grouped1 = []
for i in Drama_df.Rating:
grouped1.append(i)
grouped1= Counter(grouped1)
grouped1
Counter({'PG': 7, 'R': 67, 'PG-13': 76, 'Not Rated': 3, 'NC-17': 1})
The distribution of movies between the system rating is very uneven. 'PG': 7, 'R': 67, 'PG-13': 76, 'NC-17': 1. PG-13 has the highest number of movies which is 76, the objective is to make the distribution as even as possible. To achieve that movies will be taken from the movie_df dataframe to be added to the Drama_df dataframe to the rest of the system rating 'PG', 'R' and 'NC-17' to get 76 movies.
Drama_df
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | ... | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Studio | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Hugo | Nov 23, 2011 | Drama | PG | 180000000 | $180,000,000 | 73864507 | $73,864,507 | 111900000.0 | $111,900,000 | ... | $47,784 | 18004778 | 18,004,778 | 126.0 | 7.5 | Paramount Pictures | Par. | Asa Butterfield | Martin Scorsese | John Logan |
| 1 | The Wolfman | Feb 12, 2010 | Drama | R | 150000000 | $150,000,000 | 62189884 | $62,189,884 | 77800000.0 | $77,800,000 | ... | $-7,365,642 | 14263436 | 14,263,436 | NaN | 5.8 | NaN | Uni. | Benicio Del Toro | Joe Johnston | Andrew Kevin Walker |
| 2 | Gravity | Oct 4, 2013 | Drama | PG-13 | 110000000 | $110,000,000 | 274092705 | $274,092,705 | 449100000.0 | $449,100,000 | ... | $583,698,673 | 69369867 | 69,369,867 | 91.0 | 7.7 | Warner Bros. | WB | Sandra Bullock | Alfonso Cuarón | Alfonso Cuarón |
| 3 | Django Unchained | Dec 25, 2012 | Drama | R | 100000000 | $100,000,000 | 162805434 | $162,805,434 | 262600000.0 | $262,600,000 | ... | $349,948,323 | 44994832 | 44,994,832 | 165.0 | 8.4 | The Weinstein Company | Wein. | Jamie Foxx | Quentin Tarantino | Quentin Tarantino |
| 4 | Sing | Dec 21, 2016 | Drama | PG-13 | 75000000 | $75,000,000 | 270329045 | $270,329,045 | 363800000.0 | $363,800,000 | ... | $559,454,789 | 63445479 | 63,445,479 | 98.0 | 7.1 | TriStar Pictures | Uni. | Lorraine Bracco | Richard Baskin | Dean Pitchford |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 149 | Like Crazy | Oct 28, 2011 | Drama | PG-13 | 250000 | $250,000 | 3395391 | $3,395,391 | 336000.0 | $336,000 | ... | $3,478,400 | 372840 | 372,840 | 86.0 | 6.7 | Paramount Vantage | ParV | Felicity Jones | Drake Doremus | Drake Doremus |
| 150 | The Canyons | Aug 2, 2013 | Drama | R | 250000 | $250,000 | 59671 | $59,671 | NaN | NaN | ... | $-187,625 | 6238 | 6,238 | 99.0 | 3.8 | Prettybird | IFC | Lindsay Lohan | Paul Schrader | Bret Easton Ellis |
| 151 | Another Earth | Jul 22, 2011 | Drama | PG-13 | 175000 | $175,000 | 1321194 | $1,321,194 | 456000.0 | $456,000 | ... | $1,927,779 | 210278 | 210,278 | 92.0 | 7.0 | Artists Public Domain | FoxS | Brit Marling | Mike Cahill | Mike Cahill |
| 152 | Sound of My Voice | Apr 27, 2012 | Drama | R | 135000 | $135,000 | 408015 | $408,015 | NaN | NaN | ... | $294,448 | 42945 | 42,945 | 85.0 | 6.6 | Skyscraper Films | FoxS | Christopher Denham | Zal Batmanglij | Zal Batmanglij |
| 153 | A Ghost Story | Jul 7, 2017 | Drama | R | 100000 | $100,000 | 1594798 | $1,594,798 | NaN | NaN | ... | $2,669,782 | 276978 | 276,978 | 92.0 | 6.8 | Sailor Bear | A24 | Casey Affleck | David Lowery | David Lowery |
154 rows × 23 columns
Before adding movies from the movie_df dataframes, all its drama movies system ratings will be put into groups, to see if there is enough movies to add to add to the Drama_df dataframe
grouped1 = []
for i,x in enumerate(movie_df.rating):
if movie_df.genre[i] == 'Drama':grouped1.append(movie_df.rating[i])
grouped1=collections.Counter(grouped1)
grouped1
Counter({'R': 767,
'PG': 155,
nan: 35,
'G': 11,
'Not Rated': 113,
'PG-13': 390,
'Unrated': 27,
'NC-17': 14,
'X': 1,
'TV-PG': 2,
'TV-MA': 3})
After putting the movies into groups based on the syatem rating. The ratings thta didnt have enoough movies were 'PG', 'R' and 'NC-17'. 'R' rated needs 9 mmovies, 'PG' rated needs 59 movies, now 'NC-17' rated needs 75 movies however there is ony 14 movies that are 'NC-17' rated and 'G' rated needs 76 movies as it didnt have any movies, however there is only 11 movies that are 'G' rated. This is the movies for 'PG','R','G' and 'NC-17' needs. Movies needed in each system rating: 'PG': 59 movies needed,'R': 9 movies needed,'G': 11 movies needed and 'NC-17': 14 movies needed.
Getting the index of 'PG' rated drama movies from movie_df dataframe to add to the Drama_df dataframe.
index_pg = []
for i,x in enumerate(movie_df.genre):
if x =='Drama' and movie_df.rating[i] == 'PG' and movie_df.country[i] == 'United States':
index_pg.append(i)
print(index_pg) #showing the index_pg list
[15, 24, 33, 38, 61, 63, 64, 114, 116, 119, 135, 150, 170, 225, 265, 297, 312, 339, 373, 382, 411, 458, 461, 473, 488, 503, 527, 560, 561, 583, 606, 621, 630, 663, 776, 793, 805, 815, 841, 897, 908, 1026, 1037, 1217, 1230, 1235, 1241, 1287, 1311, 1377, 1387, 1399, 1447, 1492, 1499, 1572, 1706, 1756, 1779, 1811, 1821, 1840, 1953, 2025, 2036, 2070, 2081, 2104, 2126, 2278, 2280, 2286, 2461, 2539, 2664, 2710, 2730, 2813, 2829, 2938, 2993, 3029, 3056, 3186, 3371, 3609, 3963, 4078, 4145, 4834, 4950, 4971, 4979, 4981, 5075, 5139, 5406, 5765, 5813, 5828, 6600, 6662, 6840, 7065, 7447, 7577]
Checking the number of elements in the 'index_pg' list.
len(index_pg)
106
Getting the index of 'R' rated drama movies from movie_df dataframe to add to the Drama_df dataframe.
index_r = []
for i,x in enumerate(movie_df.genre):
if x =='Drama' and movie_df.rating[i] == 'R'and movie_df.country[i] == 'United States':
index_r.append(i)
print(index_r) #showing the index_r list
[13, 16, 81, 107, 143, 146, 160, 174, 180, 181, 188, 193, 211, 236, 248, 257, 273, 285, 304, 343, 348, 368, 408, 409, 415, 427, 443, 450, 454, 525, 533, 544, 553, 570, 585, 608, 610, 616, 620, 627, 631, 654, 656, 676, 697, 698, 741, 757, 784, 795, 802, 852, 853, 856, 861, 868, 870, 876, 911, 924, 951, 961, 962, 992, 1006, 1035, 1040, 1057, 1143, 1166, 1181, 1188, 1196, 1197, 1207, 1208, 1226, 1236, 1247, 1274, 1283, 1336, 1337, 1339, 1343, 1358, 1365, 1391, 1405, 1408, 1416, 1418, 1423, 1428, 1442, 1457, 1483, 1518, 1564, 1570, 1574, 1575, 1594, 1608, 1635, 1652, 1660, 1665, 1685, 1687, 1723, 1735, 1736, 1738, 1750, 1752, 1760, 1761, 1787, 1819, 1831, 1832, 1834, 1857, 1867, 1872, 1876, 1903, 1916, 1925, 1930, 1944, 1964, 1965, 1975, 1979, 1998, 1999, 2031, 2033, 2034, 2044, 2049, 2050, 2053, 2065, 2080, 2085, 2138, 2155, 2159, 2182, 2189, 2191, 2217, 2218, 2227, 2231, 2261, 2276, 2293, 2299, 2315, 2323, 2332, 2337, 2338, 2346, 2361, 2368, 2369, 2383, 2413, 2425, 2432, 2439, 2443, 2454, 2457, 2476, 2478, 2501, 2502, 2514, 2536, 2552, 2606, 2660, 2662, 2703, 2747, 2758, 2771, 2775, 2782, 2783, 2793, 2810, 2860, 2863, 2865, 2869, 2889, 2936, 2942, 2966, 3003, 3007, 3010, 3011, 3036, 3039, 3043, 3044, 3049, 3121, 3127, 3141, 3150, 3164, 3165, 3166, 3176, 3183, 3188, 3197, 3201, 3236, 3239, 3244, 3245, 3264, 3291, 3302, 3313, 3320, 3366, 3377, 3391, 3398, 3401, 3413, 3450, 3458, 3460, 3461, 3497, 3498, 3505, 3552, 3563, 3567, 3572, 3581, 3583, 3621, 3634, 3700, 3740, 3744, 3762, 3776, 3789, 3793, 3795, 3796, 3832, 3833, 3850, 3855, 3903, 3925, 3941, 3943, 3961, 3974, 3987, 3994, 4024, 4032, 4039, 4045, 4074, 4105, 4109, 4118, 4126, 4148, 4167, 4197, 4241, 4262, 4272, 4333, 4357, 4361, 4411, 4438, 4449, 4450, 4468, 4480, 4489, 4494, 4505, 4547, 4568, 4599, 4621, 4649, 4672, 4762, 4763, 4796, 4797, 4810, 4818, 4822, 4831, 4856, 4909, 4914, 4920, 4961, 5010, 5043, 5070, 5106, 5163, 5167, 5187, 5189, 5198, 5217, 5218, 5223, 5225, 5234, 5242, 5258, 5272, 5275, 5276, 5317, 5362, 5385, 5386, 5401, 5416, 5489, 5508, 5511, 5523, 5533, 5538, 5559, 5574, 5620, 5640, 5650, 5676, 5691, 5711, 5798, 5799, 5805, 5820, 5835, 5836, 5840, 5842, 5846, 5915, 5916, 5950, 5984, 5990, 5997, 6044, 6057, 6065, 6068, 6134, 6147, 6170, 6186, 6192, 6212, 6213, 6231, 6236, 6258, 6323, 6335, 6362, 6402, 6405, 6408, 6412, 6426, 6447, 6450, 6489, 6512, 6540, 6553, 6575, 6587, 6610, 6635, 6638, 6649, 6669, 6710, 6735, 6868, 6891, 6893, 6897, 6927, 6985, 7005, 7039, 7067, 7085, 7092, 7098, 7102, 7117, 7132, 7142, 7147, 7156, 7172, 7173, 7180, 7193, 7196, 7217, 7234, 7248, 7267, 7279, 7331, 7378, 7408, 7418, 7422, 7443, 7461, 7462, 7495, 7507, 7513, 7530, 7550, 7592, 7593, 7658, 7661]
Checking the number of elements in the 'index_r' list.
len(index_r)
460
Getting the index of 'G' rated drama movies from movie_df dataframe to add to the Drama_df dataframe.
index_g = []
for i,x in enumerate(movie_df.genre):
if x =='Drama' and movie_df.rating[i] == 'G':index_g.append(i)
print(index_g) #showing the index_g list
[321, 629, 1124, 1218, 1622, 1901, 2283, 2580, 2706, 3624, 4146]
Checking the number of elements in the 'index_g' list.
len(index_g)
11
Getting the index of 'NC-17' rated drama movies from movie_df dataframe to add to the Drama_df dataframe.
index_nc = []
for i,x in enumerate(movie_df.genre):
if x =='Drama' and movie_df.rating[i] == 'NC-17':index_nc.append(i)
print(index_nc) #showing the index_nc list
[926, 1946, 2170, 2393, 2653, 2661, 2856, 3175, 4257, 4609, 5112, 5872, 6029, 6256]
Checking the number of elements in the 'index_nc' list.
len(index_nc)
14
This is to help show what movies were already added to the Drama_df dataframe so duplicates are not created
for i,x in enumerate(Drama_df.Movie):
if Drama_df.Rating[i]=='PG':print(x)
Hugo Dolphin Tale Extraordinary Measures Wonder The Last Song War Room The Lunchbox
Turning ths index of the movies in the 'PG' rating into a dataframe called demo_pg.
demo_pg = movie_df.iloc[index_pg]
demo_pg = demo_pg.reset_index(drop=True)
Checking the dataframe and getting the first five rows of the dataframe
demo_pg.head()
| movie | rating | genre | year | released | score | votes | director | writer | star | country | budget | gross | company | runtime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Somewhere in Time | PG | Drama | 1980 | October 3, 1980 (United States) | 7.2 | 27000.0 | Jeannot Szwarc | Richard Matheson | Christopher Reeve | United States | 5100000.0 | 9709597.0 | Rastar Pictures | 103.0 |
| 1 | Urban Cowboy | PG | Drama | 1980 | June 6, 1980 (United States) | 6.4 | 14000.0 | James Bridges | Aaron Latham | John Travolta | United States | NaN | 46918287.0 | Paramount Pictures | 132.0 |
| 2 | Cattle Annie and Little Britches | PG | Drama | 1980 | April 24, 1981 (United States) | 6.1 | 604.0 | Lamont Johnson | David Eyre | Scott Glenn | United States | 5100000.0 | 534816.0 | Cattle Annie Productions | 97.0 |
| 3 | The Jazz Singer | PG | Drama | 1980 | December 19, 1980 (United States) | 5.9 | 4000.0 | Richard Fleischer | Samson Raphaelson | Laurence Olivier | United States | NaN | 27118000.0 | EMI Films | 115.0 |
| 4 | The Competition | PG | Drama | 1980 | December 3, 1980 (United States) | 6.7 | 1900.0 | Joel Oliansky | Joel Oliansky | Richard Dreyfuss | United States | NaN | 14287755.0 | Rastar Films | 123.0 |
Turning ths index of the movies in the 'NC-17' rating into a dataframe called demo_nc.
demo_nc = movie_df.iloc[index_nc]
demo_nc = demo_nc.reset_index(drop=True)
Checking the demo_nc dataframe.
demo_nc
| movie | rating | genre | year | released | score | votes | director | writer | star | country | budget | gross | company | runtime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Matador | NC-17 | Drama | 1986 | March 7, 1986 (Spain) | 7.0 | 11000.0 | Pedro Almodóvar | Pedro Almodóvar | Assumpta Serna | Spain | NaN | 286126.0 | Compañía Iberoamericana de TV | 110.0 |
| 1 | Whore | NC-17 | Drama | 1991 | October 18, 1991 (United States) | 5.6 | 3500.0 | Ken Russell | David Hines | Theresa Russell | United States | NaN | 1008404.0 | Cheap Date | 85.0 |
| 2 | Tokyo Decadence | NC-17 | Drama | 1992 | April 30, 1993 (United States) | 6.0 | 3000.0 | Ryû Murakami | Ryû Murakami | Miho Nikaido | Japan | NaN | 277845.0 | Cinemabrain | 112.0 |
| 3 | Wide Sargasso Sea | NC-17 | Drama | 1993 | April 16, 1993 (United States) | 5.7 | 1900.0 | John Duigan | Jan Sharp | Karina Lombard | Australia | NaN | 1614784.0 | Laughing Kookaburra Productions | 98.0 |
| 4 | Kids | NC-17 | Drama | 1995 | September 1, 1995 (United States) | 7.1 | 75000.0 | Larry Clark | Harmony Korine | Leo Fitzpatrick | United States | 1500000.0 | 7412216.0 | Guys Upstairs | 91.0 |
| 5 | Showgirls | NC-17 | Drama | 1995 | September 22, 1995 (United States) | 4.9 | 64000.0 | Paul Verhoeven | Joe Eszterhas | Elizabeth Berkley | France | 45000000.0 | 20358624.0 | Carolco Pictures | 128.0 |
| 6 | Crash | NC-17 | Drama | 1996 | March 21, 1997 (United States) | 6.4 | 54000.0 | David Cronenberg | J.G. Ballard | James Spader | Canada | 9000000.0 | 2671291.0 | Alliance Communications Corporation | 100.0 |
| 7 | Bent | NC-17 | Drama | 1997 | November 26, 1997 (United States) | 7.2 | 7900.0 | Sean Mathias | Martin Sherman | Lothaire Bluteau | United Kingdom | NaN | 496059.0 | Channel Four Films | 105.0 |
| 8 | The Dreamers | NC-17 | Drama | 2003 | February 20, 2004 (United States) | 7.2 | 114000.0 | Bernardo Bertolucci | Gilbert Adair | Michael Pitt | United Kingdom | 15000000.0 | 24152155.0 | Recorded Picture Company (RPC) | 115.0 |
| 9 | Ma mère | NC-17 | Drama | 2004 | May 19, 2004 (France) | 5.1 | 6600.0 | Christophe Honoré | Georges Bataille | Isabelle Huppert | France | NaN | 1510052.0 | Gemini Films | 110.0 |
| 10 | Lust, Caution | NC-17 | Drama | 2007 | October 26, 2007 (United States) | 7.5 | 38000.0 | Ang Lee | Eileen Chang | Tony Chiu-Wai Leung | Taiwan | 15000000.0 | 67091915.0 | Haishang Films | 157.0 |
| 11 | Shame | NC-17 | Drama | 2011 | January 13, 2012 (United Kingdom) | 7.2 | 187000.0 | Steve McQueen | Steve McQueen | Michael Fassbender | United Kingdom | 6500000.0 | 19123767.0 | Fox Searchlight Pictures | 101.0 |
| 12 | Elles | NC-17 | Drama | 2011 | February 1, 2012 (France) | 5.6 | 6700.0 | Malgorzata Szumowska | Tine Byrckel | Juliette Binoche | France | NaN | 3822241.0 | Slot Machine | 99.0 |
| 13 | Blue Is the Warmest Colour | NC-17 | Drama | 2013 | October 9, 2013 (Belgium) | 7.7 | 142000.0 | Abdellatif Kechiche | Abdellatif Kechiche | Léa Seydoux | France | NaN | 19465835.0 | Quat'sous Films | 180.0 |
Turning ths index of the movies in the 'R' rating into a dataframe called demo_r.
demo_r = movie_df.iloc[index_r]
demo_r = demo_r.reset_index(drop=True)
demo_r = demo_r[:11]
Checking the demo_r dataframe
demo_r
| movie | rating | genre | year | released | score | votes | director | writer | star | country | budget | gross | company | runtime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Ordinary People | R | Drama | 1980 | September 19, 1980 (United States) | 7.7 | 49000.0 | Robert Redford | Judith Guest | Donald Sutherland | United States | 6000000.0 | 54766923.0 | Paramount Pictures | 124.0 |
| 1 | Fame | R | Drama | 1980 | May 16, 1980 (United States) | 6.6 | 21000.0 | Alan Parker | Christopher Gore | Eddie Barth | United States | NaN | 21202829.0 | Metro-Goldwyn-Mayer (MGM) | 134.0 |
| 2 | Windows | R | Drama | 1980 | January 18, 1980 (United States) | 4.8 | 643.0 | Gordon Willis | Barry Siegel | Talia Shire | United States | NaN | 2128395.0 | Mike Lobell Productions | 96.0 |
| 3 | Endless Love | R | Drama | 1981 | July 17, 1981 (United States) | 4.9 | 7600.0 | Franco Zeffirelli | Scott Spencer | Brooke Shields | United States | NaN | 32492674.0 | PolyGram Filmed Entertainment | 116.0 |
| 4 | Ghost Story | R | Drama | 1981 | December 18, 1981 (United States) | 6.3 | 7900.0 | John Irvin | Peter Straub | Craig Wasson | United States | NaN | 23371905.0 | Universal Pictures | 110.0 |
| 5 | One from the Heart | R | Drama | 1981 | February 11, 1982 (United States) | 6.5 | 5700.0 | Francis Ford Coppola | Armyan Bernstein | Frederic Forrest | United States | 26000000.0 | 636796.0 | Zoetrope Studios | 107.0 |
| 6 | The Hand | R | Drama | 1981 | April 24, 1981 (United States) | 5.5 | 5700.0 | Oliver Stone | Marc Brandel | Michael Caine | United States | NaN | 2447576.0 | Orion Pictures | 104.0 |
| 7 | Pennies from Heaven | R | Drama | 1981 | January 1, 1982 (United States) | 6.5 | 5300.0 | Herbert Ross | Dennis Potter | Steve Martin | United States | 22000000.0 | 9171289.0 | Metro-Goldwyn-Mayer (MGM) | 108.0 |
| 8 | Zoot Suit | R | Drama | 1981 | January 1, 1982 (United States) | 6.8 | 1100.0 | Luis Valdez | Luis Valdez | Daniel Valdez | United States | 2700000.0 | 3256082.0 | Universal Pictures | 103.0 |
| 9 | Rich and Famous | R | Drama | 1981 | October 9, 1981 (United States) | 5.9 | 1600.0 | George Cukor | Gerald Ayres | Jacqueline Bisset | United States | NaN | 14492125.0 | Jaquet | 117.0 |
| 10 | Raggedy Man | R | Drama | 1981 | September 18, 1981 (United States) | 6.8 | 1400.0 | Jack Fisk | William D. Wittliff | Sissy Spacek | United States | NaN | 1976198.0 | Universal Pictures | 94.0 |
Turning ths index of the movies in the 'G' rating into a dataframe called demo_g.
demo_g = movie_df.iloc[index_g]
demo_g = demo_g.reset_index(drop=True)
Checking the demo_g dataframe
demo_g
| movie | rating | genre | year | released | score | votes | director | writer | star | country | budget | gross | company | runtime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | La traviata | G | Drama | 1982 | February 18, 1983 (Italy) | 7.2 | 1300.0 | Franco Zeffirelli | Francesco Maria Piave | Teresa Stratas | Netherlands | NaN | 3783329.0 | Accent Films B.V. | 109.0 |
| 1 | A Sunday in the Country | G | Drama | 1984 | April 11, 1984 (France) | 7.6 | 2500.0 | Bertrand Tavernier | Pierre Bost | Louis Ducreux | France | NaN | 2411143.0 | Films A2 | 90.0 |
| 2 | Babette's Feast | G | Drama | 1987 | March 4, 1988 (United States) | 7.8 | 19000.0 | Gabriel Axel | Karen Blixen | Stéphane Audran | Denmark | NaN | 4637920.0 | Panorama Film A/S | 103.0 |
| 3 | Little Dorrit | G | Drama | 1987 | October 21, 1988 (United States) | 7.3 | 1000.0 | Christine Edzard | Charles Dickens | Derek Jacobi | United Kingdom | NaN | 1025228.0 | Sands | 357.0 |
| 4 | Prancer | G | Drama | 1989 | November 17, 1989 (United States) | 6.4 | 4800.0 | John D. Hancock | Greg Taylor | Sam Elliott | United States | NaN | 18587135.0 | Cineplex Odeon Films | 103.0 |
| 5 | Wild Hearts Can't Be Broken | G | Drama | 1991 | May 24, 1991 (United States) | 7.2 | 5000.0 | Steve Miner | Matt Williams | Gabrielle Anwar | United States | NaN | 7294835.0 | Walt Disney Pictures | 88.0 |
| 6 | The Secret Garden | G | Drama | 1993 | August 13, 1993 (United States) | 7.3 | 38000.0 | Agnieszka Holland | Frances Hodgson Burnett | Kate Maberly | United Kingdom | 18000000.0 | 31181347.0 | Warner Bros. | 101.0 |
| 7 | Through the Olive Trees | G | Drama | 1994 | January 25, 1995 (France) | 7.8 | 7100.0 | Abbas Kiarostami | Abbas Kiarostami | Mohamad Ali Keshavarz | Iran | NaN | NaN | Abbas Kiarostami Productions | 103.0 |
| 8 | A Little Princess | G | Drama | 1995 | May 19, 1995 (United States) | 7.7 | 33000.0 | Alfonso Cuarón | Frances Hodgson Burnett | Liesel Matthews | United States | 17000000.0 | 10015449.0 | Warner Bros. | 97.0 |
| 9 | The Winslow Boy | G | Drama | 1999 | October 29, 1999 (United Kingdom) | 7.3 | 7500.0 | David Mamet | Terence Rattigan | Rebecca Pidgeon | United Kingdom | NaN | 3957934.0 | Winslow Partners Ltd. | 104.0 |
| 10 | The Rookie | G | Drama | 2002 | March 29, 2002 (United States) | 6.9 | 33000.0 | John Lee Hancock | Mike Rich | Dennis Quaid | United States | 22000000.0 | 80693537.0 | 98 MPH Productions | 127.0 |
The demo_pg dataframe has '106' rows based on those rows, '59' has to be chosen to fit the criteria for the Drama_df dataframe. These are the chosen index for the '59' rows.
demo_pg_index = [0,1,101,102,103,104,105,3,2,86,87,88,89,90,91,92,93,94,95,
96,97,98,100,75,76,77,78,79,82,83,85,74,73,71,70,69,68,67,66,65,
5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,24,25]
print(demo_pg_index) #showing the demo_pg_index list
[0, 1, 101, 102, 103, 104, 105, 3, 2, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 100, 75, 76, 77, 78, 79, 82, 83, 85, 74, 73, 71, 70, 69, 68, 67, 66, 65, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25]
Checking the number of elements in the 'demo_pg_index' list.
len(demo_pg_index)
60
This is the Worldwide Gross of the 59 new rated 'PG' movies that wil be added to the Drama_df dataframe.
worldwide_pg = [9709597, 46918287,542351353,73986904,305937718,216601214,38102988,27118000,
534816,37306334,47494916,19344615,38741732,114830111,43545364,18948425,3438735,
137587063,64605762,33473297,89137047,8526288,64667874,106269971,35656130,3987768,
7025496,152036382,171120329,13835130,14859394,134582776,6101815,
63954968,10769960,32255440,15164458,127956187,2819485,43440294,17815212,157297525,
35856053,119285432,40716963,14920781,3281232,14923752,125052686,549368315,6668025,199078,
64892670,4786789,8443124,2044892,2400000,1705908,80008942,48000000]
print(worldwide_pg) #showing the worldwide_pg list
[9709597, 46918287, 542351353, 73986904, 305937718, 216601214, 38102988, 27118000, 534816, 37306334, 47494916, 19344615, 38741732, 114830111, 43545364, 18948425, 3438735, 137587063, 64605762, 33473297, 89137047, 8526288, 64667874, 106269971, 35656130, 3987768, 7025496, 152036382, 171120329, 13835130, 14859394, 134582776, 6101815, 63954968, 10769960, 32255440, 15164458, 127956187, 2819485, 43440294, 17815212, 157297525, 35856053, 119285432, 40716963, 14920781, 3281232, 14923752, 125052686, 549368315, 6668025, 199078, 64892670, 4786789, 8443124, 2044892, 2400000, 1705908, 80008942, 48000000]
Checking the number of elements in the 'worldwide_pg' list.
len(worldwide_pg)
60
This is the Domestic Gross of the 59 new rated 'PG' movies that wil be added to the Drama_df dataframe.
domestic_pg = [0,0,201151353,67790117,132422809,108101214,34700142,0,0,27796042,41281092,19161999,
32751093,52330111,0,18848430,0,82272442,31664162,33456317,62950384,3493000,60705732,82569971,
0,0,0,104636382,100920329,10162034,0,43182776,0,22954968,0,0,0,55956187,0,0,0,37686805,
0,0,0,0,0,0,125049125,218815487,0,0,0,0,0,1537122,0,705908,80000000,0]
print(domestic_pg) #showing the domestic_pg list
[0, 0, 201151353, 67790117, 132422809, 108101214, 34700142, 0, 0, 27796042, 41281092, 19161999, 32751093, 52330111, 0, 18848430, 0, 82272442, 31664162, 33456317, 62950384, 3493000, 60705732, 82569971, 0, 0, 0, 104636382, 100920329, 10162034, 0, 43182776, 0, 22954968, 0, 0, 0, 55956187, 0, 0, 0, 37686805, 0, 0, 0, 0, 0, 0, 125049125, 218815487, 0, 0, 0, 0, 0, 1537122, 0, 705908, 80000000, 0]
Checking the number of elements in the 'domestic_pg' list.
len(domestic_pg)
60
This is the Foreign Gross of the 59 new rated 'PG' movies that wil be added to the Drama_df dataframe. This is calucated by subtracting Domestic Gross from the Worldwide Gross of each movie
foreign_pg = []
for i,x in enumerate(worldwide_pg):
if domestic_pg[i] == 0:foreign_pg.append(0)
else:foreign_pg.append(x-domestic_pg[i])
print(foreign_pg) #showing the foreign_pg list
[0, 0, 341200000, 6196787, 173514909, 108500000, 3402846, 0, 0, 9510292, 6213824, 182616, 5990639, 62500000, 0, 99995, 0, 55314621, 32941600, 16980, 26186663, 5033288, 3962142, 23700000, 0, 0, 0, 47400000, 70200000, 3673096, 0, 91400000, 0, 41000000, 0, 0, 0, 72000000, 0, 0, 0, 119610720, 0, 0, 0, 0, 0, 0, 3561, 330552828, 0, 0, 0, 0, 0, 507770, 0, 1000000, 8942, 0]
Checking the number of elements in the 'foreign_pg' list.
len(foreign_pg)
60
Creating the demo1_pg dataframe with the 59 new choosen rated 'PG' movies that will be added to the Drama_df dataframe
demo1_pg = demo_pg.iloc[demo_pg_index]
Resetting the index in the demo1_pg dataframe
demo1_pg = demo1_pg.reset_index(drop=True)
Checking the dataframe and getting the first five rows of the dataframe
demo1_pg.head()
| movie | rating | genre | year | released | score | votes | director | writer | star | country | budget | gross | company | runtime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Somewhere in Time | PG | Drama | 1980 | October 3, 1980 (United States) | 7.2 | 27000.0 | Jeannot Szwarc | Richard Matheson | Christopher Reeve | United States | 5100000.0 | 9709597.0 | Rastar Pictures | 103.0 |
| 1 | Urban Cowboy | PG | Drama | 1980 | June 6, 1980 (United States) | 6.4 | 14000.0 | James Bridges | Aaron Latham | John Travolta | United States | NaN | 46918287.0 | Paramount Pictures | 132.0 |
| 2 | Cinderella | PG | Drama | 2015 | March 13, 2015 (United States) | 6.9 | 165000.0 | Kenneth Branagh | Chris Weitz | Lily James | United States | 95000000.0 | 542358331.0 | Allison Shearmur Productions | 105.0 |
| 3 | War Room | PG | Drama | 2015 | August 28, 2015 (United States) | 6.5 | 14000.0 | Alex Kendrick | Alex Kendrick | Priscilla C. Shirer | United States | 3000000.0 | 73256266.0 | FaithStep Films | 120.0 |
| 4 | Wonder | PG | Drama | 2017 | November 17, 2017 (United States) | 8.0 | 150000.0 | Stephen Chbosky | Stephen Chbosky | Jacob Tremblay | United States | 20000000.0 | 306209289.0 | Lionsgate | 113.0 |
The 'budget' column in the demo1_pg dataframe has 'NaN' elements in them. The cell below replacese all the 'NaN' in the demo1_pg dataframe with 0.
demo1_pg = demo1_pg.fillna(0)
Checking the dataframe and getting the first five rows of the dataframe
demo1_pg.head()
| movie | rating | genre | year | released | score | votes | director | writer | star | country | budget | gross | company | runtime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Somewhere in Time | PG | Drama | 1980 | October 3, 1980 (United States) | 7.2 | 27000.0 | Jeannot Szwarc | Richard Matheson | Christopher Reeve | United States | 5100000.0 | 9709597.0 | Rastar Pictures | 103.0 |
| 1 | Urban Cowboy | PG | Drama | 1980 | June 6, 1980 (United States) | 6.4 | 14000.0 | James Bridges | Aaron Latham | John Travolta | United States | 0.0 | 46918287.0 | Paramount Pictures | 132.0 |
| 2 | Cinderella | PG | Drama | 2015 | March 13, 2015 (United States) | 6.9 | 165000.0 | Kenneth Branagh | Chris Weitz | Lily James | United States | 95000000.0 | 542358331.0 | Allison Shearmur Productions | 105.0 |
| 3 | War Room | PG | Drama | 2015 | August 28, 2015 (United States) | 6.5 | 14000.0 | Alex Kendrick | Alex Kendrick | Priscilla C. Shirer | United States | 3000000.0 | 73256266.0 | FaithStep Films | 120.0 |
| 4 | Wonder | PG | Drama | 2017 | November 17, 2017 (United States) | 8.0 | 150000.0 | Stephen Chbosky | Stephen Chbosky | Jacob Tremblay | United States | 20000000.0 | 306209289.0 | Lionsgate | 113.0 |
Getting all the index of the '0' elements in demo1_pg dataframe
nan_index = []
for i,x in enumerate(demo1_pg.budget):
if x == 0.0 :nan_index.append(i)
print(nan_index) #showing the nan_index list
[1, 7, 16, 21, 37, 40, 41, 43, 45, 47, 49, 50, 53, 54, 55, 56, 57]
Checking the number of elements in the 'nan_index' list.
len(nan_index)
17
The actual budget of the movies in the demo1_pg dataframe that was labaled '0'.
budget = [10000000.0, 422000, 9000000.0, 11000000.0, 20000000.0,
5000000.0, 7000000.0, 15000000.0, 28300000.0, 7500000.0,
5000000.0, 9000000.0, 5000000.0, 4500000.0, 4500000.0,
8000000.0, 16000000.0]
print(budget) #showing the budget list
[10000000.0, 422000, 9000000.0, 11000000.0, 20000000.0, 5000000.0, 7000000.0, 15000000.0, 28300000.0, 7500000.0, 5000000.0, 9000000.0, 5000000.0, 4500000.0, 4500000.0, 8000000.0, 16000000.0]
Replacing all the '0' elemnt in the demo1_pg dataframe with the actual budget of the movie
for i,x in enumerate(nan_index):
demo1_pg.loc[x ,'budget'] = budget[i]
Checking the number of elements in the 'demo1_pg' list.
len(demo1_pg)
60
This is the Profit of the 59 new rated 'PG' movies that wil be added to the Drama_df dataframe. This was calculated by subtracting the Budget of each movie from the Worldwide Gross.
profit_pg = []
for i,x in enumerate(worldwide_pg):
profit_pg.append(x-demo1_pg.budget[i])
print(profit_pg) #showing the profit_pg list
[4609597.0, 36918287.0, 447351353.0, 70986904.0, 285937718.0, 176601214.0, 33102988.0, 26696000.0, -4565184.0, -34693666.0, 35694916.0, 4344615.0, 6741732.0, 74830111.0, -21454636.0, 10948425.0, -5561265.0, 120587063.0, 34605762.0, 32973297.0, 69137047.0, -2473712.0, 62667874.0, 83269971.0, -9343870.0, -11012232.0, -2974504.0, 120036382.0, 81120329.0, 3835130.0, -12140606.0, 118582776.0, 3101815.0, 48954968.0, -14230040.0, -1744560.0, 5164458.0, 107956187.0, -12180515.0, 31440294.0, 12815212.0, 150297525.0, 21856053.0, 104285432.0, 28716963.0, -13379219.0, -4718768.0, 7423752.0, 108052686.0, 544368315.0, -2331975.0, -14800922.0, 42892670.0, -213211.0, 3943124.0, -2455108.0, -5600000.0, -14294092.0, 71808942.0, 20000000.0]
Checking the number of elements in the 'profit_pg' list.
len(profit_pg)
60
This is the Number of Tickets Sold of the 59 new rated 'PG' movies that wil be added to the Drama_df dataframe. This was calculated by diving the Worldwide Gross with '10', which is the average ticket price worldwide.
no_tickets_pg = []
for i in worldwide_pg:
no_tickets_pg.append(round(i/10))
print(no_tickets_pg) #showing the no_tickets_pg list
[970960, 4691829, 54235135, 7398690, 30593772, 21660121, 3810299, 2711800, 53482, 3730633, 4749492, 1934462, 3874173, 11483011, 4354536, 1894842, 343874, 13758706, 6460576, 3347330, 8913705, 852629, 6466787, 10626997, 3565613, 398777, 702550, 15203638, 17112033, 1383513, 1485939, 13458278, 610182, 6395497, 1076996, 3225544, 1516446, 12795619, 281948, 4344029, 1781521, 15729752, 3585605, 11928543, 4071696, 1492078, 328123, 1492375, 12505269, 54936832, 666802, 19908, 6489267, 478679, 844312, 204489, 240000, 170591, 8000894, 4800000]
Checking the number of elements in the 'no_tickets_pg' list.
len(no_tickets_pg)
60
After creating the columns, they are then added to the 'PG' rated movie dataframe demo1_pg that will be later added to the Drama_df dataframe.
demo1_pg['Worldwide_Gross'] = worldwide_pg
demo1_pg["Foreign_Gross"] = foreign_pg
demo1_pg['Domestic_Gross'] = domestic_pg
demo1_pg["Profit"] = profit_pg
demo1_pg['Tickets'] = no_tickets_pg
Showing the first five rows of the 'PG' rated movie dataframe demo1_pg showing the new coulmns added.
demo1_pg.head()
| movie | rating | genre | year | released | score | votes | director | writer | star | country | budget | gross | company | runtime | Worldwide_Gross | Foreign_Gross | Domestic_Gross | Profit | Tickets | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Somewhere in Time | PG | Drama | 1980 | October 3, 1980 (United States) | 7.2 | 27000.0 | Jeannot Szwarc | Richard Matheson | Christopher Reeve | United States | 5100000.0 | 9709597.0 | Rastar Pictures | 103.0 | 9709597 | 0 | 0 | 4609597.0 | 970960 |
| 1 | Urban Cowboy | PG | Drama | 1980 | June 6, 1980 (United States) | 6.4 | 14000.0 | James Bridges | Aaron Latham | John Travolta | United States | 10000000.0 | 46918287.0 | Paramount Pictures | 132.0 | 46918287 | 0 | 0 | 36918287.0 | 4691829 |
| 2 | Cinderella | PG | Drama | 2015 | March 13, 2015 (United States) | 6.9 | 165000.0 | Kenneth Branagh | Chris Weitz | Lily James | United States | 95000000.0 | 542358331.0 | Allison Shearmur Productions | 105.0 | 542351353 | 341200000 | 201151353 | 447351353.0 | 54235135 |
| 3 | War Room | PG | Drama | 2015 | August 28, 2015 (United States) | 6.5 | 14000.0 | Alex Kendrick | Alex Kendrick | Priscilla C. Shirer | United States | 3000000.0 | 73256266.0 | FaithStep Films | 120.0 | 73986904 | 6196787 | 67790117 | 70986904.0 | 7398690 |
| 4 | Wonder | PG | Drama | 2017 | November 17, 2017 (United States) | 8.0 | 150000.0 | Stephen Chbosky | Stephen Chbosky | Jacob Tremblay | United States | 20000000.0 | 306209289.0 | Lionsgate | 113.0 | 305937718 | 173514909 | 132422809 | 285937718.0 | 30593772 |
Creating the Foreign_Gross_x column by turning the foreign gross column from demo1_pg dataframe into currency. To be added to the demo1_pg dataframe.
foreign_gross_pgx = []
for i in foreign_pg:
foreign_gross_pgx.append("${:,.0f}".format(i))
print(foreign_gross_pgx) #showing the foreign_gross_pgx list
['$0', '$0', '$341,200,000', '$6,196,787', '$173,514,909', '$108,500,000', '$3,402,846', '$0', '$0', '$9,510,292', '$6,213,824', '$182,616', '$5,990,639', '$62,500,000', '$0', '$99,995', '$0', '$55,314,621', '$32,941,600', '$16,980', '$26,186,663', '$5,033,288', '$3,962,142', '$23,700,000', '$0', '$0', '$0', '$47,400,000', '$70,200,000', '$3,673,096', '$0', '$91,400,000', '$0', '$41,000,000', '$0', '$0', '$0', '$72,000,000', '$0', '$0', '$0', '$119,610,720', '$0', '$0', '$0', '$0', '$0', '$0', '$3,561', '$330,552,828', '$0', '$0', '$0', '$0', '$0', '$507,770', '$0', '$1,000,000', '$8,942', '$0']
Checking the number of elements in the 'foreign_gross_pgx' list.
len(foreign_gross_pgx)
60
Creating the Worldwide_Gross_x column by turning the worldwide gross column from demo1_pg dataframe into currency. To be added to the demo1_pg dataframe.
worldwide_gross_pgx = []
for i in worldwide_pg:
worldwide_gross_pgx.append("${:,.0f}".format(i))
print(worldwide_gross_pgx) #showing the worldwide_gross_pgx list
['$9,709,597', '$46,918,287', '$542,351,353', '$73,986,904', '$305,937,718', '$216,601,214', '$38,102,988', '$27,118,000', '$534,816', '$37,306,334', '$47,494,916', '$19,344,615', '$38,741,732', '$114,830,111', '$43,545,364', '$18,948,425', '$3,438,735', '$137,587,063', '$64,605,762', '$33,473,297', '$89,137,047', '$8,526,288', '$64,667,874', '$106,269,971', '$35,656,130', '$3,987,768', '$7,025,496', '$152,036,382', '$171,120,329', '$13,835,130', '$14,859,394', '$134,582,776', '$6,101,815', '$63,954,968', '$10,769,960', '$32,255,440', '$15,164,458', '$127,956,187', '$2,819,485', '$43,440,294', '$17,815,212', '$157,297,525', '$35,856,053', '$119,285,432', '$40,716,963', '$14,920,781', '$3,281,232', '$14,923,752', '$125,052,686', '$549,368,315', '$6,668,025', '$199,078', '$64,892,670', '$4,786,789', '$8,443,124', '$2,044,892', '$2,400,000', '$1,705,908', '$80,008,942', '$48,000,000']
Checking the number of elements in the 'worldwide_gross_pgx' list.
len(worldwide_gross_pgx)
60
Creating the Domestic_Gross_x column by turning the domestic gross column from demo1_pg dataframe into currency. To be added to the demo1_pg dataframe.
domestic_gross_pgx = []
for i in domestic_pg:
domestic_gross_pgx.append("${:,.0f}".format(i))
print(domestic_gross_pgx) #showing the domestic_gross_pgx list
['$0', '$0', '$201,151,353', '$67,790,117', '$132,422,809', '$108,101,214', '$34,700,142', '$0', '$0', '$27,796,042', '$41,281,092', '$19,161,999', '$32,751,093', '$52,330,111', '$0', '$18,848,430', '$0', '$82,272,442', '$31,664,162', '$33,456,317', '$62,950,384', '$3,493,000', '$60,705,732', '$82,569,971', '$0', '$0', '$0', '$104,636,382', '$100,920,329', '$10,162,034', '$0', '$43,182,776', '$0', '$22,954,968', '$0', '$0', '$0', '$55,956,187', '$0', '$0', '$0', '$37,686,805', '$0', '$0', '$0', '$0', '$0', '$0', '$125,049,125', '$218,815,487', '$0', '$0', '$0', '$0', '$0', '$1,537,122', '$0', '$705,908', '$80,000,000', '$0']
Checking the number of elements in the 'domestic_gross_pgx' list.
len(domestic_gross_pgx)
60
Creating the Profit_x column by turning the profit column from demo1_pg dataframe into currency. To be added to the demo1_pg dataframe.
profit_pgx = []
for i in profit_pg:
profit_pgx.append("${:,.0f}".format(i))
print(profit_pgx) #showing the profit_pgx list
['$4,609,597', '$36,918,287', '$447,351,353', '$70,986,904', '$285,937,718', '$176,601,214', '$33,102,988', '$26,696,000', '$-4,565,184', '$-34,693,666', '$35,694,916', '$4,344,615', '$6,741,732', '$74,830,111', '$-21,454,636', '$10,948,425', '$-5,561,265', '$120,587,063', '$34,605,762', '$32,973,297', '$69,137,047', '$-2,473,712', '$62,667,874', '$83,269,971', '$-9,343,870', '$-11,012,232', '$-2,974,504', '$120,036,382', '$81,120,329', '$3,835,130', '$-12,140,606', '$118,582,776', '$3,101,815', '$48,954,968', '$-14,230,040', '$-1,744,560', '$5,164,458', '$107,956,187', '$-12,180,515', '$31,440,294', '$12,815,212', '$150,297,525', '$21,856,053', '$104,285,432', '$28,716,963', '$-13,379,219', '$-4,718,768', '$7,423,752', '$108,052,686', '$544,368,315', '$-2,331,975', '$-14,800,922', '$42,892,670', '$-213,211', '$3,943,124', '$-2,455,108', '$-5,600,000', '$-14,294,092', '$71,808,942', '$20,000,000']
Checking the number of elements in the 'profit_pgx' list.
len(profit_pgx)
60
Creating the Tickets_x column by turning the tickets column from demo1_pg dataframe into a string. To be added to the demo1_pg dataframe.
str_tickets_pgx = []
for i in no_tickets_pg:
str_tickets_pgx.append("{:,.0f}".format(i))
print(str_tickets_pgx) #showing the str_tickets_pgx list
['970,960', '4,691,829', '54,235,135', '7,398,690', '30,593,772', '21,660,121', '3,810,299', '2,711,800', '53,482', '3,730,633', '4,749,492', '1,934,462', '3,874,173', '11,483,011', '4,354,536', '1,894,842', '343,874', '13,758,706', '6,460,576', '3,347,330', '8,913,705', '852,629', '6,466,787', '10,626,997', '3,565,613', '398,777', '702,550', '15,203,638', '17,112,033', '1,383,513', '1,485,939', '13,458,278', '610,182', '6,395,497', '1,076,996', '3,225,544', '1,516,446', '12,795,619', '281,948', '4,344,029', '1,781,521', '15,729,752', '3,585,605', '11,928,543', '4,071,696', '1,492,078', '328,123', '1,492,375', '12,505,269', '54,936,832', '666,802', '19,908', '6,489,267', '478,679', '844,312', '204,489', '240,000', '170,591', '8,000,894', '4,800,000']
Checking the number of elements in the 'str_tickets_pgx' list.
len(str_tickets_pgx)
60
Creating the Production_Budget_x column by turning the budget column from demo1_pg dataframe into currency. To be added to the demo1_pg dataframe.
str_budget_pgx = []
for i in demo1_pg.budget:
str_budget_pgx.append("${:,.0f}".format(i))
print(str_budget_pgx) #showing the str_budget_pgx list
['$5,100,000', '$10,000,000', '$95,000,000', '$3,000,000', '$20,000,000', '$40,000,000', '$5,000,000', '$422,000', '$5,100,000', '$72,000,000', '$11,800,000', '$15,000,000', '$32,000,000', '$40,000,000', '$65,000,000', '$8,000,000', '$9,000,000', '$17,000,000', '$30,000,000', '$500,000', '$20,000,000', '$11,000,000', '$2,000,000', '$23,000,000', '$45,000,000', '$15,000,000', '$10,000,000', '$32,000,000', '$90,000,000', '$10,000,000', '$27,000,000', '$16,000,000', '$3,000,000', '$15,000,000', '$25,000,000', '$34,000,000', '$10,000,000', '$20,000,000', '$15,000,000', '$12,000,000', '$5,000,000', '$7,000,000', '$14,000,000', '$15,000,000', '$12,000,000', '$28,300,000', '$8,000,000', '$7,500,000', '$17,000,000', '$5,000,000', '$9,000,000', '$15,000,000', '$22,000,000', '$5,000,000', '$4,500,000', '$4,500,000', '$8,000,000', '$16,000,000', '$8,200,000', '$28,000,000']
Checking the number of elements in the 'str_budget_pgx' list.
len(str_budget_pgx)
60
After creating more columns, they are then added to the 'PG' rated movie dataframe demo1_pg that will be later added to the Drama_df dataframe.
demo1_pg['Worldwide_Gross_x'] = worldwide_gross_pgx
demo1_pg["Foreign_Gross_x"] = foreign_gross_pgx
demo1_pg['Domestic_Gross_x'] = domestic_gross_pgx
demo1_pg["Profit_x"] = profit_pgx
demo1_pg['Tickets_x'] = str_tickets_pgx
demo1_pg['Production_Budget_x'] = str_budget_pgx
Showing the first five rows of the 'PG' rated dataframe demo1_pg showing the new coulmns added.
demo1_pg.head()
| movie | rating | genre | year | released | score | votes | director | writer | star | ... | Foreign_Gross | Domestic_Gross | Profit | Tickets | Worldwide_Gross_x | Foreign_Gross_x | Domestic_Gross_x | Profit_x | Tickets_x | Production_Budget_x | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Somewhere in Time | PG | Drama | 1980 | October 3, 1980 (United States) | 7.2 | 27000.0 | Jeannot Szwarc | Richard Matheson | Christopher Reeve | ... | 0 | 0 | 4609597.0 | 970960 | $9,709,597 | $0 | $0 | $4,609,597 | 970,960 | $5,100,000 |
| 1 | Urban Cowboy | PG | Drama | 1980 | June 6, 1980 (United States) | 6.4 | 14000.0 | James Bridges | Aaron Latham | John Travolta | ... | 0 | 0 | 36918287.0 | 4691829 | $46,918,287 | $0 | $0 | $36,918,287 | 4,691,829 | $10,000,000 |
| 2 | Cinderella | PG | Drama | 2015 | March 13, 2015 (United States) | 6.9 | 165000.0 | Kenneth Branagh | Chris Weitz | Lily James | ... | 341200000 | 201151353 | 447351353.0 | 54235135 | $542,351,353 | $341,200,000 | $201,151,353 | $447,351,353 | 54,235,135 | $95,000,000 |
| 3 | War Room | PG | Drama | 2015 | August 28, 2015 (United States) | 6.5 | 14000.0 | Alex Kendrick | Alex Kendrick | Priscilla C. Shirer | ... | 6196787 | 67790117 | 70986904.0 | 7398690 | $73,986,904 | $6,196,787 | $67,790,117 | $70,986,904 | 7,398,690 | $3,000,000 |
| 4 | Wonder | PG | Drama | 2017 | November 17, 2017 (United States) | 8.0 | 150000.0 | Stephen Chbosky | Stephen Chbosky | Jacob Tremblay | ... | 173514909 | 132422809 | 285937718.0 | 30593772 | $305,937,718 | $173,514,909 | $132,422,809 | $285,937,718 | 30,593,772 | $20,000,000 |
5 rows × 26 columns
Showing all the information of the demo1_pg dataframe after adding the new columns.
demo1_pg.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 60 entries, 0 to 59 Data columns (total 26 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 movie 60 non-null object 1 rating 60 non-null object 2 genre 60 non-null object 3 year 60 non-null int64 4 released 60 non-null object 5 score 60 non-null float64 6 votes 60 non-null float64 7 director 60 non-null object 8 writer 60 non-null object 9 star 60 non-null object 10 country 60 non-null object 11 budget 60 non-null float64 12 gross 60 non-null float64 13 company 60 non-null object 14 runtime 60 non-null float64 15 Worldwide_Gross 60 non-null int64 16 Foreign_Gross 60 non-null int64 17 Domestic_Gross 60 non-null int64 18 Profit 60 non-null float64 19 Tickets 60 non-null int64 20 Worldwide_Gross_x 60 non-null object 21 Foreign_Gross_x 60 non-null object 22 Domestic_Gross_x 60 non-null object 23 Profit_x 60 non-null object 24 Tickets_x 60 non-null object 25 Production_Budget_x 60 non-null object dtypes: float64(6), int64(5), object(15) memory usage: 12.3+ KB
Showing all the information of the Drama_df dataframe to make sure the demo1_pg dataframe coulmns allign with the Drama_df dataframe columns to be able to append both dataframes to eachother.
Drama_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 154 entries, 0 to 153 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Movie 154 non-null object 1 Release_Date 154 non-null object 2 Genre 154 non-null object 3 Rating 154 non-null object 4 Production_Budget 154 non-null int64 5 Production_Budget_x 154 non-null object 6 Domestic_Gross 154 non-null int64 7 Domestic_Gross_x 154 non-null object 8 Foreign_Gross 127 non-null float64 9 Foreign_Gross_x 127 non-null object 10 Worldwide_Gross 154 non-null object 11 Worldwide_Gross_x 154 non-null int64 12 Profit 154 non-null int64 13 Profit_x 154 non-null object 14 Tickets 154 non-null int64 15 Tickets_x 154 non-null object 16 Runtime 153 non-null float64 17 Averagerating 154 non-null float64 18 Company 153 non-null object 19 Studio 154 non-null object 20 Star 154 non-null object 21 Director 154 non-null object 22 Writer 154 non-null object dtypes: float64(3), int64(5), object(15) memory usage: 27.8+ KB
Deleting four coulmns form the demo1_pg dataframe to align with the Drama_df dataframe.
demo1_pg = demo1_pg.drop(['year', 'votes', 'country', 'gross'], axis=1)
Checking the dataframe and getting the first five rows of the dataframe
demo1_pg.head()
| movie | rating | genre | released | score | director | writer | star | budget | company | ... | Foreign_Gross | Domestic_Gross | Profit | Tickets | Worldwide_Gross_x | Foreign_Gross_x | Domestic_Gross_x | Profit_x | Tickets_x | Production_Budget_x | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Somewhere in Time | PG | Drama | October 3, 1980 (United States) | 7.2 | Jeannot Szwarc | Richard Matheson | Christopher Reeve | 5100000.0 | Rastar Pictures | ... | 0 | 0 | 4609597.0 | 970960 | $9,709,597 | $0 | $0 | $4,609,597 | 970,960 | $5,100,000 |
| 1 | Urban Cowboy | PG | Drama | June 6, 1980 (United States) | 6.4 | James Bridges | Aaron Latham | John Travolta | 10000000.0 | Paramount Pictures | ... | 0 | 0 | 36918287.0 | 4691829 | $46,918,287 | $0 | $0 | $36,918,287 | 4,691,829 | $10,000,000 |
| 2 | Cinderella | PG | Drama | March 13, 2015 (United States) | 6.9 | Kenneth Branagh | Chris Weitz | Lily James | 95000000.0 | Allison Shearmur Productions | ... | 341200000 | 201151353 | 447351353.0 | 54235135 | $542,351,353 | $341,200,000 | $201,151,353 | $447,351,353 | 54,235,135 | $95,000,000 |
| 3 | War Room | PG | Drama | August 28, 2015 (United States) | 6.5 | Alex Kendrick | Alex Kendrick | Priscilla C. Shirer | 3000000.0 | FaithStep Films | ... | 6196787 | 67790117 | 70986904.0 | 7398690 | $73,986,904 | $6,196,787 | $67,790,117 | $70,986,904 | 7,398,690 | $3,000,000 |
| 4 | Wonder | PG | Drama | November 17, 2017 (United States) | 8.0 | Stephen Chbosky | Stephen Chbosky | Jacob Tremblay | 20000000.0 | Lionsgate | ... | 173514909 | 132422809 | 285937718.0 | 30593772 | $305,937,718 | $173,514,909 | $132,422,809 | $285,937,718 | 30,593,772 | $20,000,000 |
5 rows × 22 columns
Showing all the information of the demo1_pg dataframe, making sure the four coulmns were deleted.
demo1_pg.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 60 entries, 0 to 59 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 movie 60 non-null object 1 rating 60 non-null object 2 genre 60 non-null object 3 released 60 non-null object 4 score 60 non-null float64 5 director 60 non-null object 6 writer 60 non-null object 7 star 60 non-null object 8 budget 60 non-null float64 9 company 60 non-null object 10 runtime 60 non-null float64 11 Worldwide_Gross 60 non-null int64 12 Foreign_Gross 60 non-null int64 13 Domestic_Gross 60 non-null int64 14 Profit 60 non-null float64 15 Tickets 60 non-null int64 16 Worldwide_Gross_x 60 non-null object 17 Foreign_Gross_x 60 non-null object 18 Domestic_Gross_x 60 non-null object 19 Profit_x 60 non-null object 20 Tickets_x 60 non-null object 21 Production_Budget_x 60 non-null object dtypes: float64(4), int64(4), object(14) memory usage: 10.4+ KB
Rearranging the columns in demo1_pg dataframe to align with Drama_df dataframe to be suitable for appending
demo1_pg = demo1_pg[['movie','released','genre','rating','budget','Production_Budget_x',
'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross',
'Worldwide_Gross_x','Profit','Profit_x','Tickets','Tickets_x','runtime','score',
'company','star','director','writer']]
Checking the dataframe and getting the first five rows of the dataframe
demo1_pg.head()
| movie | released | genre | rating | budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | ... | Profit | Profit_x | Tickets | Tickets_x | runtime | score | company | star | director | writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Somewhere in Time | October 3, 1980 (United States) | Drama | PG | 5100000.0 | $5,100,000 | 0 | $0 | 0 | $0 | ... | 4609597.0 | $4,609,597 | 970960 | 970,960 | 103.0 | 7.2 | Rastar Pictures | Christopher Reeve | Jeannot Szwarc | Richard Matheson |
| 1 | Urban Cowboy | June 6, 1980 (United States) | Drama | PG | 10000000.0 | $10,000,000 | 0 | $0 | 0 | $0 | ... | 36918287.0 | $36,918,287 | 4691829 | 4,691,829 | 132.0 | 6.4 | Paramount Pictures | John Travolta | James Bridges | Aaron Latham |
| 2 | Cinderella | March 13, 2015 (United States) | Drama | PG | 95000000.0 | $95,000,000 | 201151353 | $201,151,353 | 341200000 | $341,200,000 | ... | 447351353.0 | $447,351,353 | 54235135 | 54,235,135 | 105.0 | 6.9 | Allison Shearmur Productions | Lily James | Kenneth Branagh | Chris Weitz |
| 3 | War Room | August 28, 2015 (United States) | Drama | PG | 3000000.0 | $3,000,000 | 67790117 | $67,790,117 | 6196787 | $6,196,787 | ... | 70986904.0 | $70,986,904 | 7398690 | 7,398,690 | 120.0 | 6.5 | FaithStep Films | Priscilla C. Shirer | Alex Kendrick | Alex Kendrick |
| 4 | Wonder | November 17, 2017 (United States) | Drama | PG | 20000000.0 | $20,000,000 | 132422809 | $132,422,809 | 173514909 | $173,514,909 | ... | 285937718.0 | $285,937,718 | 30593772 | 30,593,772 | 113.0 | 8.0 | Lionsgate | Jacob Tremblay | Stephen Chbosky | Stephen Chbosky |
5 rows × 22 columns
Renaming the columns in the demo1_pg dataframe to align with Drama_df dataframe to be suitable for appending
demo1_pg.columns = ['Movie','Release_Date','Genre','Rating','Production_Budget','Production_Budget_x',
'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross',
'Worldwide_Gross_x','Profit','Profit_x','Tickets','Tickets_x','Runtime','Averagerating',
'Company','Star','Director','Writer']
Checking the dataframe and getting the first five rows of the dataframe
demo1_pg.head()
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | ... | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Somewhere in Time | October 3, 1980 (United States) | Drama | PG | 5100000.0 | $5,100,000 | 0 | $0 | 0 | $0 | ... | 4609597.0 | $4,609,597 | 970960 | 970,960 | 103.0 | 7.2 | Rastar Pictures | Christopher Reeve | Jeannot Szwarc | Richard Matheson |
| 1 | Urban Cowboy | June 6, 1980 (United States) | Drama | PG | 10000000.0 | $10,000,000 | 0 | $0 | 0 | $0 | ... | 36918287.0 | $36,918,287 | 4691829 | 4,691,829 | 132.0 | 6.4 | Paramount Pictures | John Travolta | James Bridges | Aaron Latham |
| 2 | Cinderella | March 13, 2015 (United States) | Drama | PG | 95000000.0 | $95,000,000 | 201151353 | $201,151,353 | 341200000 | $341,200,000 | ... | 447351353.0 | $447,351,353 | 54235135 | 54,235,135 | 105.0 | 6.9 | Allison Shearmur Productions | Lily James | Kenneth Branagh | Chris Weitz |
| 3 | War Room | August 28, 2015 (United States) | Drama | PG | 3000000.0 | $3,000,000 | 67790117 | $67,790,117 | 6196787 | $6,196,787 | ... | 70986904.0 | $70,986,904 | 7398690 | 7,398,690 | 120.0 | 6.5 | FaithStep Films | Priscilla C. Shirer | Alex Kendrick | Alex Kendrick |
| 4 | Wonder | November 17, 2017 (United States) | Drama | PG | 20000000.0 | $20,000,000 | 132422809 | $132,422,809 | 173514909 | $173,514,909 | ... | 285937718.0 | $285,937,718 | 30593772 | 30,593,772 | 113.0 | 8.0 | Lionsgate | Jacob Tremblay | Stephen Chbosky | Stephen Chbosky |
5 rows × 22 columns
Dropping the 'Studio' column from the Drama_df dataframe to align to the demo1_pg dataframe, to be suitable for appending both dataframes.
Drama_df = Drama_df.drop(['Studio'], axis=1)
Making sure the 'Studio' coulmn was dropped from the Drama_df dataframe
Drama_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 154 entries, 0 to 153 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Movie 154 non-null object 1 Release_Date 154 non-null object 2 Genre 154 non-null object 3 Rating 154 non-null object 4 Production_Budget 154 non-null int64 5 Production_Budget_x 154 non-null object 6 Domestic_Gross 154 non-null int64 7 Domestic_Gross_x 154 non-null object 8 Foreign_Gross 127 non-null float64 9 Foreign_Gross_x 127 non-null object 10 Worldwide_Gross 154 non-null object 11 Worldwide_Gross_x 154 non-null int64 12 Profit 154 non-null int64 13 Profit_x 154 non-null object 14 Tickets 154 non-null int64 15 Tickets_x 154 non-null object 16 Runtime 153 non-null float64 17 Averagerating 154 non-null float64 18 Company 153 non-null object 19 Star 154 non-null object 20 Director 154 non-null object 21 Writer 154 non-null object dtypes: float64(3), int64(5), object(14) memory usage: 26.6+ KB
Checking the demo1_pg dataframe to make sure it alligns with the Drama_df dataframe
demo1_pg.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 60 entries, 0 to 59 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Movie 60 non-null object 1 Release_Date 60 non-null object 2 Genre 60 non-null object 3 Rating 60 non-null object 4 Production_Budget 60 non-null float64 5 Production_Budget_x 60 non-null object 6 Domestic_Gross 60 non-null int64 7 Domestic_Gross_x 60 non-null object 8 Foreign_Gross 60 non-null int64 9 Foreign_Gross_x 60 non-null object 10 Worldwide_Gross 60 non-null int64 11 Worldwide_Gross_x 60 non-null object 12 Profit 60 non-null float64 13 Profit_x 60 non-null object 14 Tickets 60 non-null int64 15 Tickets_x 60 non-null object 16 Runtime 60 non-null float64 17 Averagerating 60 non-null float64 18 Company 60 non-null object 19 Star 60 non-null object 20 Director 60 non-null object 21 Writer 60 non-null object dtypes: float64(4), int64(4), object(14) memory usage: 10.4+ KB
Appending the demo1_pg dataframe to the Drama_df dataframe. To create the demo_drama dataframe. The rest of the system rating 'R', 'G' and NC-17' will be added to the demo_drama datafrane to complete the dataframe for this analysis.
demo_drama = Drama_df.append(demo1_pg, ignore_index=True)
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\1081320491.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. demo_drama = Drama_df.append(demo1_pg, ignore_index=True)
Checking demo_drama dataframe.
demo_drama
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | ... | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Hugo | Nov 23, 2011 | Drama | PG | 180000000.0 | $180,000,000 | 73864507 | $73,864,507 | 111900000.0 | $111,900,000 | ... | 47784.0 | $47,784 | 18004778 | 18,004,778 | 126.0 | 7.5 | Paramount Pictures | Asa Butterfield | Martin Scorsese | John Logan |
| 1 | The Wolfman | Feb 12, 2010 | Drama | R | 150000000.0 | $150,000,000 | 62189884 | $62,189,884 | 77800000.0 | $77,800,000 | ... | -7365642.0 | $-7,365,642 | 14263436 | 14,263,436 | NaN | 5.8 | NaN | Benicio Del Toro | Joe Johnston | Andrew Kevin Walker |
| 2 | Gravity | Oct 4, 2013 | Drama | PG-13 | 110000000.0 | $110,000,000 | 274092705 | $274,092,705 | 449100000.0 | $449,100,000 | ... | 583698673.0 | $583,698,673 | 69369867 | 69,369,867 | 91.0 | 7.7 | Warner Bros. | Sandra Bullock | Alfonso Cuarón | Alfonso Cuarón |
| 3 | Django Unchained | Dec 25, 2012 | Drama | R | 100000000.0 | $100,000,000 | 162805434 | $162,805,434 | 262600000.0 | $262,600,000 | ... | 349948323.0 | $349,948,323 | 44994832 | 44,994,832 | 165.0 | 8.4 | The Weinstein Company | Jamie Foxx | Quentin Tarantino | Quentin Tarantino |
| 4 | Sing | Dec 21, 2016 | Drama | PG-13 | 75000000.0 | $75,000,000 | 270329045 | $270,329,045 | 363800000.0 | $363,800,000 | ... | 559454789.0 | $559,454,789 | 63445479 | 63,445,479 | 98.0 | 7.1 | TriStar Pictures | Lorraine Bracco | Richard Baskin | Dean Pitchford |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 209 | Testament | January 5, 1984 (Argentina) | Drama | PG | 4500000.0 | $4,500,000 | 1537122 | $1,537,122 | 507770.0 | $507,770 | ... | -2455108.0 | $-2,455,108 | 204489 | 204,489 | 90.0 | 7.0 | Paramount Pictures | Jane Alexander | Lynne Littman | Carol Amen |
| 210 | Table for Five | March 10, 1983 (Australia) | Drama | PG | 8000000.0 | $8,000,000 | 0 | $0 | 0.0 | $0 | ... | -5600000.0 | $-5,600,000 | 240000 | 240,000 | 122.0 | 6.1 | CBS Theatrical Films | Jon Voight | Robert Lieberman | David Seltzer |
| 211 | Man, Woman and Child | September 7, 1983 (France) | Drama | PG | 16000000.0 | $16,000,000 | 705908 | $705,908 | 1000000.0 | $1,000,000 | ... | -14294092.0 | $-14,294,092 | 170591 | 170,591 | 99.0 | 6.1 | Gaylord Productions | Martin Sheen | Dick Richards | Erich Segal |
| 212 | Footloose | February 17, 1984 (United States) | Drama | PG | 8200000.0 | $8,200,000 | 80000000 | $80,000,000 | 8942.0 | $8,942 | ... | 71808942.0 | $71,808,942 | 8000894 | 8,000,894 | 107.0 | 6.6 | Paramount Pictures | Kevin Bacon | Herbert Ross | Dean Pitchford |
| 213 | The Natural | May 11, 1984 (United States) | Drama | PG | 28000000.0 | $28,000,000 | 0 | $0 | 0.0 | $0 | ... | 20000000.0 | $20,000,000 | 4800000 | 4,800,000 | 138.0 | 7.5 | TriStar Pictures | Robert Redford | Barry Levinson | Bernard Malamud |
214 rows × 22 columns
Merging the 'G' rated dataframe demo_g with the 'R' rated dataframe demo_r to the 'NC-17' rated dataframe demo_nc to each other, naming the dataframe demo_rest. This is to get the rest of the new movies that wil be added to the demo_drama dataframe, to complete the Drama dataframe for this analysis.
demo_rest = demo_nc.append(demo_g, ignore_index=True).append(demo_r, ignore_index=True)
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\1901008892.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. demo_rest = demo_nc.append(demo_g, ignore_index=True).append(demo_r, ignore_index=True)
Checking the dataframe and getting the first five rows of the dataframe
demo_rest.head()
| movie | rating | genre | year | released | score | votes | director | writer | star | country | budget | gross | company | runtime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Matador | NC-17 | Drama | 1986 | March 7, 1986 (Spain) | 7.0 | 11000.0 | Pedro Almodóvar | Pedro Almodóvar | Assumpta Serna | Spain | NaN | 286126.0 | Compañía Iberoamericana de TV | 110.0 |
| 1 | Whore | NC-17 | Drama | 1991 | October 18, 1991 (United States) | 5.6 | 3500.0 | Ken Russell | David Hines | Theresa Russell | United States | NaN | 1008404.0 | Cheap Date | 85.0 |
| 2 | Tokyo Decadence | NC-17 | Drama | 1992 | April 30, 1993 (United States) | 6.0 | 3000.0 | Ryû Murakami | Ryû Murakami | Miho Nikaido | Japan | NaN | 277845.0 | Cinemabrain | 112.0 |
| 3 | Wide Sargasso Sea | NC-17 | Drama | 1993 | April 16, 1993 (United States) | 5.7 | 1900.0 | John Duigan | Jan Sharp | Karina Lombard | Australia | NaN | 1614784.0 | Laughing Kookaburra Productions | 98.0 |
| 4 | Kids | NC-17 | Drama | 1995 | September 1, 1995 (United States) | 7.1 | 75000.0 | Larry Clark | Harmony Korine | Leo Fitzpatrick | United States | 1500000.0 | 7412216.0 | Guys Upstairs | 91.0 |
Droping the index that had 'NaN' values in the budget column of the demo_rest, that didnt have any data online on the the budget of that movie. And also resetting the demo_rest dataframe after dropping the indexx.
demo_rest = demo_rest.drop(labels=[12,16,19,23,27], axis=0)
demo_rest = demo_rest.reset_index(drop=True)
Checking the dataframe and getting the first five rows of the dataframe
demo_rest.head()
| movie | rating | genre | year | released | score | votes | director | writer | star | country | budget | gross | company | runtime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Matador | NC-17 | Drama | 1986 | March 7, 1986 (Spain) | 7.0 | 11000.0 | Pedro Almodóvar | Pedro Almodóvar | Assumpta Serna | Spain | NaN | 286126.0 | Compañía Iberoamericana de TV | 110.0 |
| 1 | Whore | NC-17 | Drama | 1991 | October 18, 1991 (United States) | 5.6 | 3500.0 | Ken Russell | David Hines | Theresa Russell | United States | NaN | 1008404.0 | Cheap Date | 85.0 |
| 2 | Tokyo Decadence | NC-17 | Drama | 1992 | April 30, 1993 (United States) | 6.0 | 3000.0 | Ryû Murakami | Ryû Murakami | Miho Nikaido | Japan | NaN | 277845.0 | Cinemabrain | 112.0 |
| 3 | Wide Sargasso Sea | NC-17 | Drama | 1993 | April 16, 1993 (United States) | 5.7 | 1900.0 | John Duigan | Jan Sharp | Karina Lombard | Australia | NaN | 1614784.0 | Laughing Kookaburra Productions | 98.0 |
| 4 | Kids | NC-17 | Drama | 1995 | September 1, 1995 (United States) | 7.1 | 75000.0 | Larry Clark | Harmony Korine | Leo Fitzpatrick | United States | 1500000.0 | 7412216.0 | Guys Upstairs | 91.0 |
The 'budget' column in the demo_rest dataframe has 'NaN' elements in them. The cell below replacese all the 'NaN' in the demo_rest dataframe with '0'. To replace the '0' with their actual budget later on.
demo_rest = demo_rest.fillna(0)
Checking the dataframe and getting the first five rows of the dataframe
demo_rest.head()
| movie | rating | genre | year | released | score | votes | director | writer | star | country | budget | gross | company | runtime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Matador | NC-17 | Drama | 1986 | March 7, 1986 (Spain) | 7.0 | 11000.0 | Pedro Almodóvar | Pedro Almodóvar | Assumpta Serna | Spain | 0.0 | 286126.0 | Compañía Iberoamericana de TV | 110.0 |
| 1 | Whore | NC-17 | Drama | 1991 | October 18, 1991 (United States) | 5.6 | 3500.0 | Ken Russell | David Hines | Theresa Russell | United States | 0.0 | 1008404.0 | Cheap Date | 85.0 |
| 2 | Tokyo Decadence | NC-17 | Drama | 1992 | April 30, 1993 (United States) | 6.0 | 3000.0 | Ryû Murakami | Ryû Murakami | Miho Nikaido | Japan | 0.0 | 277845.0 | Cinemabrain | 112.0 |
| 3 | Wide Sargasso Sea | NC-17 | Drama | 1993 | April 16, 1993 (United States) | 5.7 | 1900.0 | John Duigan | Jan Sharp | Karina Lombard | Australia | 0.0 | 1614784.0 | Laughing Kookaburra Productions | 98.0 |
| 4 | Kids | NC-17 | Drama | 1995 | September 1, 1995 (United States) | 7.1 | 75000.0 | Larry Clark | Harmony Korine | Leo Fitzpatrick | United States | 1500000.0 | 7412216.0 | Guys Upstairs | 91.0 |
Getting all the index of the '0' elements in demo_rest dataframe
nan_index = []
for i,x in enumerate(demo_rest.budget):
if x == 0.0 :nan_index.append(i)
print(nan_index) #showing the nan_index list
[0, 1, 2, 3, 7, 9, 12, 13, 14, 15, 16, 18, 22, 23, 24, 26, 29, 30]
Checking the number of elements in the 'nan_index' list.
len(nan_index)
18
The actual budget of the movies in the demo_rest dataframe that was labaled '0'.
budget = [12500000, 1000000, 20000, 955472, 5000000, 2734384,
4000000, 35446775, 700000, 8600000, 7000000, 4400000,
8500000, 20000000, 100000, 6500000, 11500000, 9000000]
print(budget) #showing the budget list
[12500000, 1000000, 20000, 955472, 5000000, 2734384, 4000000, 35446775, 700000, 8600000, 7000000, 4400000, 8500000, 20000000, 100000, 6500000, 11500000, 9000000]
Replacing all the '0' elemnt in the demo_rest dataframe with the actual budget of the movie
for i,x in enumerate(nan_index):
demo_rest.loc[x ,'budget'] = budget[i]
Showing the first five rows of the 'G', 'R' and 'NC-17' rated movie dataframe demo_rest showing the new data replacing the '0' in the budget coulmn.
demo_rest.head()
| movie | rating | genre | year | released | score | votes | director | writer | star | country | budget | gross | company | runtime | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Matador | NC-17 | Drama | 1986 | March 7, 1986 (Spain) | 7.0 | 11000.0 | Pedro Almodóvar | Pedro Almodóvar | Assumpta Serna | Spain | 12500000.0 | 286126.0 | Compañía Iberoamericana de TV | 110.0 |
| 1 | Whore | NC-17 | Drama | 1991 | October 18, 1991 (United States) | 5.6 | 3500.0 | Ken Russell | David Hines | Theresa Russell | United States | 1000000.0 | 1008404.0 | Cheap Date | 85.0 |
| 2 | Tokyo Decadence | NC-17 | Drama | 1992 | April 30, 1993 (United States) | 6.0 | 3000.0 | Ryû Murakami | Ryû Murakami | Miho Nikaido | Japan | 20000.0 | 277845.0 | Cinemabrain | 112.0 |
| 3 | Wide Sargasso Sea | NC-17 | Drama | 1993 | April 16, 1993 (United States) | 5.7 | 1900.0 | John Duigan | Jan Sharp | Karina Lombard | Australia | 955472.0 | 1614784.0 | Laughing Kookaburra Productions | 98.0 |
| 4 | Kids | NC-17 | Drama | 1995 | September 1, 1995 (United States) | 7.1 | 75000.0 | Larry Clark | Harmony Korine | Leo Fitzpatrick | United States | 1500000.0 | 7412216.0 | Guys Upstairs | 91.0 |
This is the Worldwide Gross of the new rated 'G', 'R' and 'NC-17 movies that wil be added to the demo_rest dataframe, to be later appened to the demo_drama dataframe to complete the dataframe for this analysis..
worldwide_rest = [17356268, 1008404, 277845, 1614784, 20412216, 20350754, 98410061, 496059,
15121165, 1022148, 67091915, 20412841, 19465835, 195494, 2411143,
1025228, 18587135, 8721243, 40300, 10015449, 80693537,
54766923, 77211836, 34718173, 1951683, 636796, 2447576, 9171289,
3256082, 13000000, 11000000]
print(worldwide_rest) #showing the worldwide_rest list
[17356268, 1008404, 277845, 1614784, 20412216, 20350754, 98410061, 496059, 15121165, 1022148, 67091915, 20412841, 19465835, 195494, 2411143, 1025228, 18587135, 8721243, 40300, 10015449, 80693537, 54766923, 77211836, 34718173, 1951683, 636796, 2447576, 9171289, 3256082, 13000000, 11000000]
Checking the number of elements in the 'worldwide_rest' list.
len(worldwide_rest)
31
This is the Domestic Gross of the new rated 'G', 'R' and 'NC-17' movies that wil be added to the demo_rest dataframe, to be later appened to the demo_drama dataframe to complete the dataframe for this analysis.
domestic_rest = [12594698, 0, 0, 0, 7412216, 0, 54580300, 0, 2532228, 71616, 4604982,
4002293, 2199787, 0, 0, 0, 0, 0, 0, 0, 75600072, 0, 22455510, 23438250,
1596371, 0, 0, 0, 0, 0, 0]
print(domestic_rest) #showing the domestic_rest list
[12594698, 0, 0, 0, 7412216, 0, 54580300, 0, 2532228, 71616, 4604982, 4002293, 2199787, 0, 0, 0, 0, 0, 0, 0, 75600072, 0, 22455510, 23438250, 1596371, 0, 0, 0, 0, 0, 0]
Checking the number of elements in the 'domestic_rest' list.
len(domestic_rest)
31
This is the Foreign Gross of the new rated 'G', 'R' and 'NC-17' movies that wil be added to the demo_rest dataframe, to be later appened to the demo_drama dataframe to complete the dataframe for this analysis. This is calucated by subtracting Domestic Gross from the Worldwide Gross of each movie
foreign_rest = []
for i,x in enumerate(worldwide_rest):
if domestic_rest[i] == 0:foreign_rest.append(0)
else:foreign_rest.append(x-domestic_rest[i])
print(foreign_rest) #showing the foreign_rest list
[4761570, 0, 0, 0, 13000000, 0, 43829761, 0, 12588937, 950532, 62486933, 16410548, 17266048, 0, 0, 0, 0, 0, 0, 0, 5093465, 0, 54756326, 11279923, 355312, 0, 0, 0, 0, 0, 0]
Checking the number of elements in the 'foreign_rest' list.
len(foreign_rest)
31
This is the Profit of the new rated 'G', 'R' and 'NC-17' movies that wil be added to the demo_rest dataframe, to be later appened to the demo_drama dataframe to complete the dataframe for this analysis. This was calculated by subtracting the Budget of each movie from the Worldwide Gross.
profit_rest = []
for i,x in enumerate(worldwide_rest):
profit_rest.append(x-demo_rest.budget[i])
print(profit_rest) #showing the profit_rest list
[4856268.0, 8404.0, 257845.0, 659312.0, 18912216.0, -24649246.0, 89410061.0, -4503941.0, 121165.0, -1712236.0, 52091915.0, 13912841.0, 15465835.0, -35251281.0, 1711143.0, -7574772.0, 11587135.0, -9278757.0, -4359700.0, -6984551.0, 58693537.0, 48766923.0, 68711836.0, 14718173.0, 1851683.0, -25363204.0, -4052424.0, -12828711.0, 556082.0, 1500000.0, 2000000.0]
Checking the number of elements in the 'profit_rest' list.
len(profit_rest)
31
This is the Profit of the new rated 'G', 'R' and 'NC-17' movies that wil be added to the demo_rest dataframe, to be later appened to the demo_drama dataframe to complete the dataframe for this analysis. This was calculated by subtracting the Budget of each movie from the Worldwide Gross.
no_tickets_rest = []
for i in worldwide_rest:
no_tickets_rest.append(round(i/10))
print(no_tickets_rest) #showing the no_tickets_rest list
[1735627, 100840, 27784, 161478, 2041222, 2035075, 9841006, 49606, 1512116, 102215, 6709192, 2041284, 1946584, 19549, 241114, 102523, 1858714, 872124, 4030, 1001545, 8069354, 5476692, 7721184, 3471817, 195168, 63680, 244758, 917129, 325608, 1300000, 1100000]
Checking the number of elements in the 'no_tickets_rest' list.
len(no_tickets_rest)
31
Creating the Worldwide_Gross_x column by turning the worldwide gross list (worldwide_rest) into currency. To be added to the demo_rest dataframe.
worldwide_gross_restx = []
for i in worldwide_rest:
worldwide_gross_restx.append("${:,.0f}".format(i))
print(worldwide_gross_restx) #showing the worldwide_gross_restx list
['$17,356,268', '$1,008,404', '$277,845', '$1,614,784', '$20,412,216', '$20,350,754', '$98,410,061', '$496,059', '$15,121,165', '$1,022,148', '$67,091,915', '$20,412,841', '$19,465,835', '$195,494', '$2,411,143', '$1,025,228', '$18,587,135', '$8,721,243', '$40,300', '$10,015,449', '$80,693,537', '$54,766,923', '$77,211,836', '$34,718,173', '$1,951,683', '$636,796', '$2,447,576', '$9,171,289', '$3,256,082', '$13,000,000', '$11,000,000']
Checking the number of elements in the 'worldwide_gross_restx' list.
len(worldwide_gross_restx)
31
Creating the Domestic_Gross_x column by turning the domestic gross list (domestic_rest) into currency. To be added to the demo_rest dataframe.
domestic_gross_restx = []
for i in domestic_rest:
domestic_gross_restx.append("${:,.0f}".format(i))
print(domestic_gross_restx) #showing the domestic_gross_restx list
['$12,594,698', '$0', '$0', '$0', '$7,412,216', '$0', '$54,580,300', '$0', '$2,532,228', '$71,616', '$4,604,982', '$4,002,293', '$2,199,787', '$0', '$0', '$0', '$0', '$0', '$0', '$0', '$75,600,072', '$0', '$22,455,510', '$23,438,250', '$1,596,371', '$0', '$0', '$0', '$0', '$0', '$0']
Checking the number of elements in the 'domestic_gross_restx' list.
len(domestic_gross_restx)
31
Creating the Foreign_Gross_x column by turning the foreign gross list (foreign_rest) into currency. To be added to the demo_rest dataframe.
foreign_gross_restx = []
for i in foreign_rest:
foreign_gross_restx.append("${:,.0f}".format(i))
print(foreign_gross_restx) #showing the foreign_gross_restx list
['$4,761,570', '$0', '$0', '$0', '$13,000,000', '$0', '$43,829,761', '$0', '$12,588,937', '$950,532', '$62,486,933', '$16,410,548', '$17,266,048', '$0', '$0', '$0', '$0', '$0', '$0', '$0', '$5,093,465', '$0', '$54,756,326', '$11,279,923', '$355,312', '$0', '$0', '$0', '$0', '$0', '$0']
Checking the number of elements in the 'foreign_gross_restx' list.
len(foreign_gross_restx)
31
Creating the Profit_x column by turning the profit list (profit_rest) into currency. To be added to the demo_rest dataframe.
profit_restx = []
for i in profit_rest:
profit_restx.append("${:,.0f}".format(i))
print(profit_restx) #showing the profit_restx list
['$4,856,268', '$8,404', '$257,845', '$659,312', '$18,912,216', '$-24,649,246', '$89,410,061', '$-4,503,941', '$121,165', '$-1,712,236', '$52,091,915', '$13,912,841', '$15,465,835', '$-35,251,281', '$1,711,143', '$-7,574,772', '$11,587,135', '$-9,278,757', '$-4,359,700', '$-6,984,551', '$58,693,537', '$48,766,923', '$68,711,836', '$14,718,173', '$1,851,683', '$-25,363,204', '$-4,052,424', '$-12,828,711', '$556,082', '$1,500,000', '$2,000,000']
Checking the number of elements in the 'profit_restx' list.
len(profit_restx)
31
Creating the Tickets_x column by turning the ticket list (no_tickets_rest) into currency. To be added to the demo_rest dataframe.
str_tickets_restx = []
for i in no_tickets_rest:
str_tickets_restx.append("{:,.0f}".format(i))
print(str_tickets_restx) #showing the str_tickets_restx list
['1,735,627', '100,840', '27,784', '161,478', '2,041,222', '2,035,075', '9,841,006', '49,606', '1,512,116', '102,215', '6,709,192', '2,041,284', '1,946,584', '19,549', '241,114', '102,523', '1,858,714', '872,124', '4,030', '1,001,545', '8,069,354', '5,476,692', '7,721,184', '3,471,817', '195,168', '63,680', '244,758', '917,129', '325,608', '1,300,000', '1,100,000']
Checking the number of elements in the 'str_tickets_restx' list.
len(str_tickets_restx)
31
Creating the Production_Budget_x column by turning the demo_rest budget coulmn into currency. To be added to the demo_rest dataframe.
str_budget_restx = []
for i in demo_rest.budget:
str_budget_restx.append("${:,.0f}".format(i))
print(str_budget_restx) #showing the str_budget_restx list
['$12,500,000', '$1,000,000', '$20,000', '$955,472', '$1,500,000', '$45,000,000', '$9,000,000', '$5,000,000', '$15,000,000', '$2,734,384', '$15,000,000', '$6,500,000', '$4,000,000', '$35,446,775', '$700,000', '$8,600,000', '$7,000,000', '$18,000,000', '$4,400,000', '$17,000,000', '$22,000,000', '$6,000,000', '$8,500,000', '$20,000,000', '$100,000', '$26,000,000', '$6,500,000', '$22,000,000', '$2,700,000', '$11,500,000', '$9,000,000']
Checking the number of elements in the 'str_budget_restx' list.
len(str_budget_restx)
31
Deleting four coulmns form the demo_rest dataframe to align with the demo_drama dataframe, to mkae sure it is suitable for merging, to complete the dataframe for this analysis .
demo_rest = demo_rest.drop(['year', 'votes', 'country', 'gross'], axis=1)
After creating all the final columns, they are then added to the demo_rest dataframe, to be later appened to the demo_drama dataframe to complete the dataframe for this analysis.
demo_rest['Worldwide_Gross'] = worldwide_rest
demo_rest["Foreign_Gross"] = foreign_rest
demo_rest['Domestic_Gross'] = domestic_rest
demo_rest["Profit"] = profit_rest
demo_rest['Tickets'] = no_tickets_rest
demo_rest['Worldwide_Gross_x'] = worldwide_gross_restx
demo_rest["Foreign_Gross_x"] = foreign_gross_restx
demo_rest['Domestic_Gross_x'] = domestic_gross_restx
demo_rest["Profit_x"] = profit_restx
demo_rest['Tickets_x'] = str_tickets_restx
demo_rest['Production_Budget_x'] = str_budget_restx
Rearranging the columns in demo_rest dataframe to align with demo_drama dataframe to be suitable for appending, creating the final dataframe Drama_DataFrame for this analysis
demo_rest = demo_rest[['movie','released','genre','rating','budget','Production_Budget_x',
'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross',
'Worldwide_Gross_x','Profit','Profit_x','Tickets','Tickets_x','runtime','score',
'company','star','director','writer']]
Checking the dataframe and getting the first five rows of the dataframe
demo_rest.head()
| movie | released | genre | rating | budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | ... | Profit | Profit_x | Tickets | Tickets_x | runtime | score | company | star | director | writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Matador | March 7, 1986 (Spain) | Drama | NC-17 | 12500000.0 | $12,500,000 | 12594698 | $12,594,698 | 4761570 | $4,761,570 | ... | 4856268.0 | $4,856,268 | 1735627 | 1,735,627 | 110.0 | 7.0 | Compañía Iberoamericana de TV | Assumpta Serna | Pedro Almodóvar | Pedro Almodóvar |
| 1 | Whore | October 18, 1991 (United States) | Drama | NC-17 | 1000000.0 | $1,000,000 | 0 | $0 | 0 | $0 | ... | 8404.0 | $8,404 | 100840 | 100,840 | 85.0 | 5.6 | Cheap Date | Theresa Russell | Ken Russell | David Hines |
| 2 | Tokyo Decadence | April 30, 1993 (United States) | Drama | NC-17 | 20000.0 | $20,000 | 0 | $0 | 0 | $0 | ... | 257845.0 | $257,845 | 27784 | 27,784 | 112.0 | 6.0 | Cinemabrain | Miho Nikaido | Ryû Murakami | Ryû Murakami |
| 3 | Wide Sargasso Sea | April 16, 1993 (United States) | Drama | NC-17 | 955472.0 | $955,472 | 0 | $0 | 0 | $0 | ... | 659312.0 | $659,312 | 161478 | 161,478 | 98.0 | 5.7 | Laughing Kookaburra Productions | Karina Lombard | John Duigan | Jan Sharp |
| 4 | Kids | September 1, 1995 (United States) | Drama | NC-17 | 1500000.0 | $1,500,000 | 7412216 | $7,412,216 | 13000000 | $13,000,000 | ... | 18912216.0 | $18,912,216 | 2041222 | 2,041,222 | 91.0 | 7.1 | Guys Upstairs | Leo Fitzpatrick | Larry Clark | Harmony Korine |
5 rows × 22 columns
Renaming the columns in the demo_rest dataframe to align with demo_drama dataframe to be suitable for appending, creating the final dataframe Drama_DataFrame for this analysis
demo_rest.columns = ['Movie','Release_Date','Genre','Rating','Production_Budget','Production_Budget_x',
'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross',
'Worldwide_Gross_x','Profit','Profit_x','Tickets','Tickets_x','Runtime','Averagerating',
'Company','Star','Director','Writer']
Checking the first five rows of demo_rest dataframe to make sure it aligns with demo_drama dataframe before appending both dataframes to eachother to create the final dataframes that will be used for this analysis Drama_DataFrame.
demo_rest.head()
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | ... | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Matador | March 7, 1986 (Spain) | Drama | NC-17 | 12500000.0 | $12,500,000 | 12594698 | $12,594,698 | 4761570 | $4,761,570 | ... | 4856268.0 | $4,856,268 | 1735627 | 1,735,627 | 110.0 | 7.0 | Compañía Iberoamericana de TV | Assumpta Serna | Pedro Almodóvar | Pedro Almodóvar |
| 1 | Whore | October 18, 1991 (United States) | Drama | NC-17 | 1000000.0 | $1,000,000 | 0 | $0 | 0 | $0 | ... | 8404.0 | $8,404 | 100840 | 100,840 | 85.0 | 5.6 | Cheap Date | Theresa Russell | Ken Russell | David Hines |
| 2 | Tokyo Decadence | April 30, 1993 (United States) | Drama | NC-17 | 20000.0 | $20,000 | 0 | $0 | 0 | $0 | ... | 257845.0 | $257,845 | 27784 | 27,784 | 112.0 | 6.0 | Cinemabrain | Miho Nikaido | Ryû Murakami | Ryû Murakami |
| 3 | Wide Sargasso Sea | April 16, 1993 (United States) | Drama | NC-17 | 955472.0 | $955,472 | 0 | $0 | 0 | $0 | ... | 659312.0 | $659,312 | 161478 | 161,478 | 98.0 | 5.7 | Laughing Kookaburra Productions | Karina Lombard | John Duigan | Jan Sharp |
| 4 | Kids | September 1, 1995 (United States) | Drama | NC-17 | 1500000.0 | $1,500,000 | 7412216 | $7,412,216 | 13000000 | $13,000,000 | ... | 18912216.0 | $18,912,216 | 2041222 | 2,041,222 | 91.0 | 7.1 | Guys Upstairs | Leo Fitzpatrick | Larry Clark | Harmony Korine |
5 rows × 22 columns
Checking the first five rows of demo_drama dataframe to make sure it aligns with demo_rest dataframe before appending both dataframes to eachother to create the final dataframes that will be used for this analysis Drama_DataFrame.
demo_drama.head()
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | ... | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Hugo | Nov 23, 2011 | Drama | PG | 180000000.0 | $180,000,000 | 73864507 | $73,864,507 | 111900000.0 | $111,900,000 | ... | 47784.0 | $47,784 | 18004778 | 18,004,778 | 126.0 | 7.5 | Paramount Pictures | Asa Butterfield | Martin Scorsese | John Logan |
| 1 | The Wolfman | Feb 12, 2010 | Drama | R | 150000000.0 | $150,000,000 | 62189884 | $62,189,884 | 77800000.0 | $77,800,000 | ... | -7365642.0 | $-7,365,642 | 14263436 | 14,263,436 | NaN | 5.8 | NaN | Benicio Del Toro | Joe Johnston | Andrew Kevin Walker |
| 2 | Gravity | Oct 4, 2013 | Drama | PG-13 | 110000000.0 | $110,000,000 | 274092705 | $274,092,705 | 449100000.0 | $449,100,000 | ... | 583698673.0 | $583,698,673 | 69369867 | 69,369,867 | 91.0 | 7.7 | Warner Bros. | Sandra Bullock | Alfonso Cuarón | Alfonso Cuarón |
| 3 | Django Unchained | Dec 25, 2012 | Drama | R | 100000000.0 | $100,000,000 | 162805434 | $162,805,434 | 262600000.0 | $262,600,000 | ... | 349948323.0 | $349,948,323 | 44994832 | 44,994,832 | 165.0 | 8.4 | The Weinstein Company | Jamie Foxx | Quentin Tarantino | Quentin Tarantino |
| 4 | Sing | Dec 21, 2016 | Drama | PG-13 | 75000000.0 | $75,000,000 | 270329045 | $270,329,045 | 363800000.0 | $363,800,000 | ... | 559454789.0 | $559,454,789 | 63445479 | 63,445,479 | 98.0 | 7.1 | TriStar Pictures | Lorraine Bracco | Richard Baskin | Dean Pitchford |
5 rows × 22 columns
Appending demo_rest dataframe to demo_drama dataframe to eachother to create the final dataframe Drama_DataFrame that will be used for this analysis.
Drama_DataFrame = demo_drama.append(demo_rest, ignore_index=True)
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\3808589794.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. Drama_DataFrame = demo_drama.append(demo_rest, ignore_index=True)
Rearranging the columns in Drama_DataFrame.
Drama_DataFrame = Drama_DataFrame[['Movie','Release_Date','Genre','Rating','Production_Budget','Production_Budget_x',
'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross_x',
'Worldwide_Gross','Profit','Profit_x','Tickets','Tickets_x','Runtime','Averagerating',
'Company','Star','Director','Writer']]
Renaming the columns in the Drama_DataFrame.
Drama_DataFrame.columns = ['Movie','Release_Date','Genre','Rating','Production_Budget','Production_Budget_x',
'Domestic_Gross','Domestic_Gross_x','Foreign_Gross','Foreign_Gross_x','Worldwide_Gross',
'Worldwide_Gross_x','Profit','Profit_x','Tickets','Tickets_x','Runtime','Averagerating',
'Company','Star','Director','Writer']
The new 'Drama_DataFrame' dataframe.
Drama_DataFrame.head()
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | ... | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Hugo | Nov 23, 2011 | Drama | PG | 180000000.0 | $180,000,000 | 73864507 | $73,864,507 | 111900000.0 | $111,900,000 | ... | 47784.0 | $47,784 | 18004778 | 18,004,778 | 126.0 | 7.5 | Paramount Pictures | Asa Butterfield | Martin Scorsese | John Logan |
| 1 | The Wolfman | Feb 12, 2010 | Drama | R | 150000000.0 | $150,000,000 | 62189884 | $62,189,884 | 77800000.0 | $77,800,000 | ... | -7365642.0 | $-7,365,642 | 14263436 | 14,263,436 | NaN | 5.8 | NaN | Benicio Del Toro | Joe Johnston | Andrew Kevin Walker |
| 2 | Gravity | Oct 4, 2013 | Drama | PG-13 | 110000000.0 | $110,000,000 | 274092705 | $274,092,705 | 449100000.0 | $449,100,000 | ... | 583698673.0 | $583,698,673 | 69369867 | 69,369,867 | 91.0 | 7.7 | Warner Bros. | Sandra Bullock | Alfonso Cuarón | Alfonso Cuarón |
| 3 | Django Unchained | Dec 25, 2012 | Drama | R | 100000000.0 | $100,000,000 | 162805434 | $162,805,434 | 262600000.0 | $262,600,000 | ... | 349948323.0 | $349,948,323 | 44994832 | 44,994,832 | 165.0 | 8.4 | The Weinstein Company | Jamie Foxx | Quentin Tarantino | Quentin Tarantino |
| 4 | Sing | Dec 21, 2016 | Drama | PG-13 | 75000000.0 | $75,000,000 | 270329045 | $270,329,045 | 363800000.0 | $363,800,000 | ... | 559454789.0 | $559,454,789 | 63445479 | 63,445,479 | 98.0 | 7.1 | TriStar Pictures | Lorraine Bracco | Richard Baskin | Dean Pitchford |
5 rows × 22 columns
The 'G' genre does not have enough movies to be analyzed, more 'G' rated movies will be added to the Drama_dataframe dataframw for appropriate analysis.
These are the names of the new 'G-rated' movies that will be stored in the 'g_name' list, for the 'Movie' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_name = ['Beauty and the Beast 1991' , 'The Little Rascals', 'Ramona and Beezus',
'The Black Stallion', 'The Hunchback of Notre Drame', 'Babe', 'Pollyanna',
'Babe: Pig in the City', 'Lassie Come Home', 'Charlotte\'s Web', 'A Little Princess',
'Kit Kittredge: An American Girl', 'The Rookie', 'The Secret Garden', 'The Sound of Music',
'The Tale of Despereaux', 'The Lion King 1994', 'Bambi 1942', 'My Fair Lady 1964',
'Before the Wrath', 'Hachiko: A Dog\'s Story', 'Giant', 'The Ten Commandments 1966',
'The Quiet Man', 'Three Cions in the Fountain', 'Miracle of Marcelino']
The 'g_name' list.
print(g_name)
['Beauty and the Beast 1991', 'The Little Rascals', 'Ramona and Beezus', 'The Black Stallion', 'The Hunchback of Notre Drame', 'Babe', 'Pollyanna', 'Babe: Pig in the City', 'Lassie Come Home', "Charlotte's Web", 'A Little Princess', 'Kit Kittredge: An American Girl', 'The Rookie', 'The Secret Garden', 'The Sound of Music', 'The Tale of Despereaux', 'The Lion King 1994', 'Bambi 1942', 'My Fair Lady 1964', 'Before the Wrath', "Hachiko: A Dog's Story", 'Giant', 'The Ten Commandments 1966', 'The Quiet Man', 'Three Cions in the Fountain', 'Miracle of Marcelino']
Checking the number of elements in the 'g_name' list.
len(g_name)
26
These are the writers of the new 'G-rated' movies that will be stored in the 'g_writer' list, for the 'Writer' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_writer = ['Linda Woolverton','Penelope Spheeris','Beverly Cleary','Melissa Mathison','Victor Hugo',
'Chris Noonan','Eleanor Hodgman Porter','Judy Morris','Eric Knight','E. B. White',
'Frances Hodgson Burnett','Valerie Tripp','Mike Rich','Caroline Thompson','Ernest Lehman',
'Kate DiCamillo','Jonathan Roberts','Perce Pearce','Alan Jay Lerner','Brent Miller Jr.',
'Stephen P. Lindsey','Edna Ferber',' Fredric M. Frank','Frank S. Nugent','John Patrick',
'José María Sánchez-Silva']
The 'g_writer' list.
print(g_writer)
['Linda Woolverton', 'Penelope Spheeris', 'Beverly Cleary', 'Melissa Mathison', 'Victor Hugo', 'Chris Noonan', 'Eleanor Hodgman Porter', 'Judy Morris', 'Eric Knight', 'E. B. White', 'Frances Hodgson Burnett', 'Valerie Tripp', 'Mike Rich', 'Caroline Thompson', 'Ernest Lehman', 'Kate DiCamillo', 'Jonathan Roberts', 'Perce Pearce', 'Alan Jay Lerner', 'Brent Miller Jr.', 'Stephen P. Lindsey', 'Edna Ferber', ' Fredric M. Frank', 'Frank S. Nugent', 'John Patrick', 'José María Sánchez-Silva']
Checking the number of elements in the 'g_writer' list.
len(g_writer)
26
These are the release date of the new 'G-rated' movies that will be stored in the 'g_date' list, for the 'Release_Date' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_date = ['Novemeber 22, 1991', 'August 5, 1994', 'July 23, 2010', 'October 17, 1979', 'June 21, 1996',
'August 4, 1995', 'May 19, 1960', 'November 25, 1998', 'December 1943', 'October 15, 1952',
'May 10, 1995', 'July 2, 2008', 'March 29, 2002', 'April 13 1993', 'April 1, 1993',
'December 19, 2008', 'June 24, 1994', 'August 21, 1942', 'December 25, 1964', 'March 3, 2020',
'March 12, 2010', 'Novemebr 24, 1952', 'October 5, 1956', 'Septemebr 14, 1954', 'May 15, 1954',
'October 22, 1956']
The 'g_name' list.
print(g_date)
['Novemeber 22, 1991', 'August 5, 1994', 'July 23, 2010', 'October 17, 1979', 'June 21, 1996', 'August 4, 1995', 'May 19, 1960', 'November 25, 1998', 'December 1943', 'October 15, 1952', 'May 10, 1995', 'July 2, 2008', 'March 29, 2002', 'April 13 1993', 'April 1, 1993', 'December 19, 2008', 'June 24, 1994', 'August 21, 1942', 'December 25, 1964', 'March 3, 2020', 'March 12, 2010', 'Novemebr 24, 1952', 'October 5, 1956', 'Septemebr 14, 1954', 'May 15, 1954', 'October 22, 1956']
Checking the number of elements in the 'g_date' list.
len(g_date)
26
These are the production budget of the new 'G-rated' movies that will be stored in the 'g_budget' list, for the 'Production_Budget' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_budget = [20000000, 23000000, 15000000, 2700000, 70000000, 30000000, 2500000, 90000000, 666000, 85000000,
17000000, 10000000, 22000000, 18000000, 8200000, 60000000, 45000000, 858000, 17000000, 300000,
10000000, 6400000, 13000000, 1750000, 1700000, 3000000]
The 'g_budget' list.
print(g_budget)
[20000000, 23000000, 15000000, 2700000, 70000000, 30000000, 2500000, 90000000, 666000, 85000000, 17000000, 10000000, 22000000, 18000000, 8200000, 60000000, 45000000, 858000, 17000000, 300000, 10000000, 6400000, 13000000, 1750000, 1700000, 3000000]
Checking the number of elements in the 'g_budget' list.
len(g_budget)
26
These are the domestic gross that will be stored in the 'g_domestic' list, of the new 'G-rated' movies for the 'Domestic_Gross' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_domestic =[206333165, 51764950, 25167002, 0, 100138851, 63658910, 0, 18319860, 0, 82985708, 0, 0,
75600072, 0, 163214286, 50877145, 421785283, 102797000, 72000000, 0, 0, 30176619, 0, 7600000,
5000000, 0]
The 'g_domestic' list.
print(g_domestic)
[206333165, 51764950, 25167002, 0, 100138851, 63658910, 0, 18319860, 0, 82985708, 0, 0, 75600072, 0, 163214286, 50877145, 421785283, 102797000, 72000000, 0, 0, 30176619, 0, 7600000, 5000000, 0]
Checking the number of elements in the 'g_domestic' list.
len(g_domestic)
26
These are the foreign gross of the new 'G-rated' movies that will be stored in the 'g_foreign' list, for the 'Foreign_Gross' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_foreign = [232323678, 15183000, 1302619, 0, 225361149, 182441090, 0, 50812000, 0, 61000000, 0,
0, 4891444, 0, 122999909, 39605172, 564429585, 165203000, 71636, 0, 0, 15790, 0, 377,
7000000, 0]
The 'g_foreign' list.
print(g_foreign)
[232323678, 15183000, 1302619, 0, 225361149, 182441090, 0, 50812000, 0, 61000000, 0, 0, 4891444, 0, 122999909, 39605172, 564429585, 165203000, 71636, 0, 0, 15790, 0, 377, 7000000, 0]
Checking the number of elements in the 'g_foreign' list.
len(g_foreign)
26
These are the worldwide gross of the new 'G-rated' movies for the 'Worldwide_Gross' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_worldwide = [438656843, 66947950, 27469621, 37799643, 325500000, 246100000, 3750000, 69131860,
4517000, 143985708, 10015449, 17657973, 80491516, 311281000, 286214195, 90482317, 986214868,
268000000, 72071636, 108998, 47707417, 30194409, 65500000, 7600377, 12000000, 592861]
The 'g_worldwide' list.
print(g_worldwide)
[438656843, 66947950, 27469621, 37799643, 325500000, 246100000, 3750000, 69131860, 4517000, 143985708, 10015449, 17657973, 80491516, 311281000, 286214195, 90482317, 986214868, 268000000, 72071636, 108998, 47707417, 30194409, 65500000, 7600377, 12000000, 592861]
Checking the number of elements in the 'g_worldwide' list.
len(g_worldwide)
26
These are the runtime of the new 'G-rated' movies for the 'Runtime' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_runtime = [84, 83, 103, 118, 91, 91, 134, 90, 89, 97, 97, 105, 128, 101, 174, 87, 87, 70, 175, 84, 90,
200, 220, 129, 102, 91]
The 'g_runtime' list.
print(g_runtime)
[84, 83, 103, 118, 91, 91, 134, 90, 89, 97, 97, 105, 128, 101, 174, 87, 87, 70, 175, 84, 90, 200, 220, 129, 102, 91]
Checking the number of elements in the 'g_runtime' list.
len(g_runtime)
26
These are the rating of the new 'G-rated' movies for the 'Averagerating' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_rating = [8.0, 6.3, 6.5, 7.4, 7.0, 9.6, 9.0, 5.8, 7.1, 6.3, 7.6, 6.5, 6.9, 7.3, 8.1, 6.1, 8.5, 7.3, 7.8,
6.6, 8.1, 7.6, 7.9, 7.7, 6.3, 7.1]
The 'g_rating' list.
print(g_rating)
[8.0, 6.3, 6.5, 7.4, 7.0, 9.6, 9.0, 5.8, 7.1, 6.3, 7.6, 6.5, 6.9, 7.3, 8.1, 6.1, 8.5, 7.3, 7.8, 6.6, 8.1, 7.6, 7.9, 7.7, 6.3, 7.1]
Checking the number of elements in the 'g_rating' list.
len(g_rating)
26
These are the names of the production company of the new 'G-rated' movies for the 'Company' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_company = ['The Walt Disney Company','Universal Pictures', 'Fox 2000 Pictures', 'Omni Zoetrope',
'The Walt Disney Company', 'Universal Pictures', 'The Walt Disney Company',
'Universal Pictures', 'Metro-Goldwyn-Mayer', 'Hanna-Barber-Productions',
'Warner Bros. Pictures', 'Picturehouse', 'The Walt Disney Company', 'American Zoetrope',
'20th Century Studios', 'Universal Pictures', 'The Walt Disney Company',
'The Walt Disney Company', 'Warner Bros. Pictures', 'NaN', 'Inferno Distribution',
'Warner Bros. Pictures', 'Motion Picture Associates', 'Republic Pictures',
'20th Century Studios', 'United Motion Pictures']
The 'g_company' list.
print(g_company)
['The Walt Disney Company', 'Universal Pictures', 'Fox 2000 Pictures', 'Omni Zoetrope', 'The Walt Disney Company', 'Universal Pictures', 'The Walt Disney Company', 'Universal Pictures', 'Metro-Goldwyn-Mayer', 'Hanna-Barber-Productions', 'Warner Bros. Pictures', 'Picturehouse', 'The Walt Disney Company', 'American Zoetrope', '20th Century Studios', 'Universal Pictures', 'The Walt Disney Company', 'The Walt Disney Company', 'Warner Bros. Pictures', 'NaN', 'Inferno Distribution', 'Warner Bros. Pictures', 'Motion Picture Associates', 'Republic Pictures', '20th Century Studios', 'United Motion Pictures']
Checking the number of elements in the 'g_company' list.
len(g_company)
26
These are the names of the directors of the new 'G-rated' movies for the 'Director' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_director = ['Gary Trousdale', 'Penelope Spheeris', 'Elizabeth Allen Ressenbaum', 'Carral Ballard',
'Gary Trouside', 'Chris Noonan', 'David Swift', 'George Miller', 'Fred M. Wilcox',
'Gary Winick', 'Alfonso Cuaron', 'Patricia Rozema', 'John Lee Hancock', 'Agnieszka Holland',
'Robert Wise', 'Sam Fell', 'Rob Minkoff', 'John Hubley', 'Cecil Beaton', 'Brent Miller',
'Lasse Hallstrom', 'George Stevens', 'Cecil B. Demille', 'John Ford', 'Jean Negulesco',
'Ladislao Vajda']
The 'g_director' list.
print(g_director)
['Gary Trousdale', 'Penelope Spheeris', 'Elizabeth Allen Ressenbaum', 'Carral Ballard', 'Gary Trouside', 'Chris Noonan', 'David Swift', 'George Miller', 'Fred M. Wilcox', 'Gary Winick', 'Alfonso Cuaron', 'Patricia Rozema', 'John Lee Hancock', 'Agnieszka Holland', 'Robert Wise', 'Sam Fell', 'Rob Minkoff', 'John Hubley', 'Cecil Beaton', 'Brent Miller', 'Lasse Hallstrom', 'George Stevens', 'Cecil B. Demille', 'John Ford', 'Jean Negulesco', 'Ladislao Vajda']
Checking the number of elements in the 'g_director' list.
len(g_director)
26
These are the names of the starring actors of the new 'G-rated' movies for the 'Star' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_star = ['Paige O\'Hara', 'Brittany Ashton', 'Joey King', 'Kelly Reno', 'Demi Moore', 'James Cromwell',
'Hayley Mills', 'James Cromwell', 'Roddy McDowall', 'Dokota Fanning', 'Liesel Matthews',
'Abigail Breslin', 'Dennis Quaid', 'Kate Maberly', 'Julie Andrews', 'Matthew Brodewick',
'James Earl Jones', 'Donnie Dunagan', 'Audley Hepburn', 'Gemma Rizzuto', 'Richard Gere',
'James Dean', 'Yul Brynner', 'John Wayne', 'Jean Peters', 'Pablito Calvo']
len(g_star)
26
The 'g_rating' list.
print(g_star)
["Paige O'Hara", 'Brittany Ashton', 'Joey King', 'Kelly Reno', 'Demi Moore', 'James Cromwell', 'Hayley Mills', 'James Cromwell', 'Roddy McDowall', 'Dokota Fanning', 'Liesel Matthews', 'Abigail Breslin', 'Dennis Quaid', 'Kate Maberly', 'Julie Andrews', 'Matthew Brodewick', 'James Earl Jones', 'Donnie Dunagan', 'Audley Hepburn', 'Gemma Rizzuto', 'Richard Gere', 'James Dean', 'Yul Brynner', 'John Wayne', 'Jean Peters', 'Pablito Calvo']
This is for the 'Genre' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_genre = []
for i in range(26):g_genre.append('Drama')
print(g_genre) #showing the g_genre list
['Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama']
Checking the number of elements in the 'g_genre' list.
len(g_genre)
26
This is for the 'Rating' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_rated = []
for i in range(26):g_rated.append('G')
print(g_rated) #showing the g_rated list
['G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G']
Checking the number of elements in the 'g_rated' list.
len(g_rated)
26
These are the Profit of the new 'G-rated' movies for the 'Profit' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame. This was calculated by subtracting the Budget of each movie from the Worldwide Gross.
g_profit = []
for x,y in enumerate(g_worldwide):
g_profit.append(y-g_budget[x])
print(g_profit) #showing the g_profit list
[418656843, 43947950, 12469621, 35099643, 255500000, 216100000, 1250000, -20868140, 3851000, 58985708, -6984551, 7657973, 58491516, 293281000, 278014195, 30482317, 941214868, 267142000, 55071636, -191002, 37707417, 23794409, 52500000, 5850377, 10300000, -2407139]
Checking the number of elements in the 'g_profit' list.
len(g_profit)
26
These are the number of Tickets sold of the new 'G-rated' movies for the 'Tickets' column in the new g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_tickets = []
for i in g_worldwide:
g_tickets.append(round(i/10))
print(g_tickets) #showing the g_tickets list
[43865684, 6694795, 2746962, 3779964, 32550000, 24610000, 375000, 6913186, 451700, 14398571, 1001545, 1765797, 8049152, 31128100, 28621420, 9048232, 98621487, 26800000, 7207164, 10900, 4770742, 3019441, 6550000, 760038, 1200000, 59286]
Checking the number of elements in the 'g_director' list.
len(g_tickets)
26
Creating the Production_Budget_x column by turning the g_budget list into currency. For the g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_budget_x = []
for i in g_budget:
g_budget_x.append("${:,.0f}".format(i))
print(g_budget_x) #showing the g_budget_x list
['$20,000,000', '$23,000,000', '$15,000,000', '$2,700,000', '$70,000,000', '$30,000,000', '$2,500,000', '$90,000,000', '$666,000', '$85,000,000', '$17,000,000', '$10,000,000', '$22,000,000', '$18,000,000', '$8,200,000', '$60,000,000', '$45,000,000', '$858,000', '$17,000,000', '$300,000', '$10,000,000', '$6,400,000', '$13,000,000', '$1,750,000', '$1,700,000', '$3,000,000']
Checking the number of elements in the 'g_budget_x' list.
len(g_budget_x)
26
Creating the Domestic_Gross_x column by turning the g_domestic list into currency. For the g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_domestic_x = []
for i in g_domestic:
g_domestic_x.append("${:,.0f}".format(i))
print(g_domestic_x) #showing the g_domestic_x list
['$206,333,165', '$51,764,950', '$25,167,002', '$0', '$100,138,851', '$63,658,910', '$0', '$18,319,860', '$0', '$82,985,708', '$0', '$0', '$75,600,072', '$0', '$163,214,286', '$50,877,145', '$421,785,283', '$102,797,000', '$72,000,000', '$0', '$0', '$30,176,619', '$0', '$7,600,000', '$5,000,000', '$0']
Checking the number of elements in the 'g_director' list.
len(g_domestic_x)
26
Creating the Foreign_Gross_x column by turning the g_foreign list into currency. For the g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_foreign_x = []
for i in g_foreign:
g_foreign_x.append("${:,.0f}".format(i))
print(g_foreign_x) #showing the g_foreign_x list
['$232,323,678', '$15,183,000', '$1,302,619', '$0', '$225,361,149', '$182,441,090', '$0', '$50,812,000', '$0', '$61,000,000', '$0', '$0', '$4,891,444', '$0', '$122,999,909', '$39,605,172', '$564,429,585', '$165,203,000', '$71,636', '$0', '$0', '$15,790', '$0', '$377', '$7,000,000', '$0']
Checking the number of elements in the 'g_foreign_x' list.
len(g_foreign_x)
26
Creating the Worldwide_Gross_x column by turning the g_worldwide list into currency. For the g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_worldwide_x = []
for i in g_worldwide:
g_worldwide_x.append("${:,.0f}".format(i))
print(g_worldwide_x) #showing the g_worldwide_x list
['$438,656,843', '$66,947,950', '$27,469,621', '$37,799,643', '$325,500,000', '$246,100,000', '$3,750,000', '$69,131,860', '$4,517,000', '$143,985,708', '$10,015,449', '$17,657,973', '$80,491,516', '$311,281,000', '$286,214,195', '$90,482,317', '$986,214,868', '$268,000,000', '$72,071,636', '$108,998', '$47,707,417', '$30,194,409', '$65,500,000', '$7,600,377', '$12,000,000', '$592,861']
Checking the number of elements in the 'g_worldwide_x' list.
len(g_worldwide_x)
26
Creating the Profit_x column by turning the g_profit list into currency. For the g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_profit_x = []
for i in g_profit:
g_profit_x.append("${:,.0f}".format(i))
print(g_profit_x) #showing the g_profit_x list
['$418,656,843', '$43,947,950', '$12,469,621', '$35,099,643', '$255,500,000', '$216,100,000', '$1,250,000', '$-20,868,140', '$3,851,000', '$58,985,708', '$-6,984,551', '$7,657,973', '$58,491,516', '$293,281,000', '$278,014,195', '$30,482,317', '$941,214,868', '$267,142,000', '$55,071,636', '$-191,002', '$37,707,417', '$23,794,409', '$52,500,000', '$5,850,377', '$10,300,000', '$-2,407,139']
Checking the number of elements in the 'g_profit_x' list.
len(g_profit_x)
26
Creating the Tickets_x column by turning the g_tickets list into string. For the g_dataframe dataframe that will be appended to the Drama_DataFrame.
g_tickets_x = []
for i in g_tickets:
g_tickets_x.append("{:,.0f}".format(i))
print(g_tickets_x) #showing the g_tickets_x list
['43,865,684', '6,694,795', '2,746,962', '3,779,964', '32,550,000', '24,610,000', '375,000', '6,913,186', '451,700', '14,398,571', '1,001,545', '1,765,797', '8,049,152', '31,128,100', '28,621,420', '9,048,232', '98,621,487', '26,800,000', '7,207,164', '10,900', '4,770,742', '3,019,441', '6,550,000', '760,038', '1,200,000', '59,286']
Checking the number of elements in the 'g_tickets_x' list.
len(g_tickets_x)
26
Creating the g_dataframe dataframe.
g_dataframe = pd.DataFrame({"Movie":g_name, "Release_Date":g_date, "Genre":g_genre, "Rating":g_rated,
"Production_Budget":g_budget, "Production_Budget_x":g_budget_x,
"Domestic_Gross":g_domestic, "Domestic_Gross_x":g_domestic_x,
"Foreign_Gross":g_foreign, "Foreign_Gross_x":g_foreign_x,
"Worldwide_Gross":g_worldwide, "Worldwide_Gross_x":g_worldwide_x,
"Profit":g_profit, "Profit_x":g_profit_x, "Tickets":g_tickets,
"Tickets_x":g_tickets_x, "Runtime":g_runtime, "Averagerating":g_rating,
"Company":g_company, "Star":g_star, "Director":g_director, "Writer":g_writer
})
The first five columns of the g_dataframe dataframe.
g_dataframe.head()
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | ... | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Beauty and the Beast 1991 | Novemeber 22, 1991 | Drama | G | 20000000 | $20,000,000 | 206333165 | $206,333,165 | 232323678 | $232,323,678 | ... | 418656843 | $418,656,843 | 43865684 | 43,865,684 | 84 | 8.0 | The Walt Disney Company | Paige O'Hara | Gary Trousdale | Linda Woolverton |
| 1 | The Little Rascals | August 5, 1994 | Drama | G | 23000000 | $23,000,000 | 51764950 | $51,764,950 | 15183000 | $15,183,000 | ... | 43947950 | $43,947,950 | 6694795 | 6,694,795 | 83 | 6.3 | Universal Pictures | Brittany Ashton | Penelope Spheeris | Penelope Spheeris |
| 2 | Ramona and Beezus | July 23, 2010 | Drama | G | 15000000 | $15,000,000 | 25167002 | $25,167,002 | 1302619 | $1,302,619 | ... | 12469621 | $12,469,621 | 2746962 | 2,746,962 | 103 | 6.5 | Fox 2000 Pictures | Joey King | Elizabeth Allen Ressenbaum | Beverly Cleary |
| 3 | The Black Stallion | October 17, 1979 | Drama | G | 2700000 | $2,700,000 | 0 | $0 | 0 | $0 | ... | 35099643 | $35,099,643 | 3779964 | 3,779,964 | 118 | 7.4 | Omni Zoetrope | Kelly Reno | Carral Ballard | Melissa Mathison |
| 4 | The Hunchback of Notre Drame | June 21, 1996 | Drama | G | 70000000 | $70,000,000 | 100138851 | $100,138,851 | 225361149 | $225,361,149 | ... | 255500000 | $255,500,000 | 32550000 | 32,550,000 | 91 | 7.0 | The Walt Disney Company | Demi Moore | Gary Trouside | Victor Hugo |
5 rows × 22 columns
Appending the g_dataframe dataframe to the Drama_DataFrame.
Drama_DataFrame = Drama_DataFrame.append(g_dataframe, ignore_index=True)
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\3157750573.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. Drama_DataFrame = Drama_DataFrame.append(g_dataframe, ignore_index=True)
It has been noticed that the 'NC-17' genre does not have enough movies to be analyzed, more 'NC-17' rated movies will be added to the Drama_dataframe dataframw for appropriate analysis.
These are the names of the new 'NC-17 rated' movies for the 'Movie' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_name = ['Showgirls', 'The Dreamers', 'Shame', 'Blue Is the Warmest Colour', 'Blue Valentine',
'Two Girls and a Guy', 'Elles', 'Hell', 'Killer Joe', 'Se, jie', 'Queen of Hearts',
'The Evil Dead', 'Man Bites Dog', 'Shame', 'Nymphomaniac: Vol. I', 'Arabian Nights',
'Frontier(s)', 'Chained', 'Natural Born Killers', 'Clerks', 'Bad Lieutenant',
'The Big Feast', 'Beyond the Valley of the Dolls', 'Kids', 'Crash', 'Last Tango in Paris',
'Pink Flamingos', 'Lust, Caution ', 'Happiness 1998', 'Orgazmo', 'A Dirty Shame',
'Young Adam', 'Whore 1991', 'Ma Mère', 'Law of Desire' ]
print(nc17_name) #showing the nc17_name list
['Showgirls', 'The Dreamers', 'Shame', 'Blue Is the Warmest Colour', 'Blue Valentine', 'Two Girls and a Guy', 'Elles', 'Hell', 'Killer Joe', 'Se, jie', 'Queen of Hearts', 'The Evil Dead', 'Man Bites Dog', 'Shame', 'Nymphomaniac: Vol. I', 'Arabian Nights', 'Frontier(s)', 'Chained', 'Natural Born Killers', 'Clerks', 'Bad Lieutenant', 'The Big Feast', 'Beyond the Valley of the Dolls', 'Kids', 'Crash', 'Last Tango in Paris', 'Pink Flamingos', 'Lust, Caution ', 'Happiness 1998', 'Orgazmo', 'A Dirty Shame', 'Young Adam', 'Whore 1991', 'Ma Mère', 'Law of Desire']
Checking the number of elements in the 'nc17_name' list.
len(nc17_name)
35
These are the names of the directors of the new 'NC-17 rated' movies for the 'Director' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_director = ['Paul Verhoeven', 'Bernardo Bertolucci', 'Steve McQueen', 'Abdellatif Kechiche',
'Derek Cianfrance', 'James Toback', 'Małgośka Szumowska', 'Tim Fehlbaum',
'William Friedkin','Ang Lee', 'May el-Toukhy', 'Sam Raimi', 'Rémy Belvaux',
'Steve McQueen', 'Lars von Trier', 'Pier Paolo Pasolini', 'Xavier Gens',
'Jennifer Lynch', 'Oliver Stone', 'Kevin Smith','Abel Ferrara','John Gulager',
'Russ Meyer', 'Larry Clark', 'Paul Haggis', 'Bernardo Bertolucci','John Waters',
'Ang Lee', 'Todd Solondz', 'Trey Parker', 'John Waters', 'David Mackenzie',
'Ken Russell', 'Christophe Honoré', 'Pedro Almodóvar']
print(nc17_director) #showing the nc17_director list
['Paul Verhoeven', 'Bernardo Bertolucci', 'Steve McQueen', 'Abdellatif Kechiche', 'Derek Cianfrance', 'James Toback', 'Małgośka Szumowska', 'Tim Fehlbaum', 'William Friedkin', 'Ang Lee', 'May el-Toukhy', 'Sam Raimi', 'Rémy Belvaux', 'Steve McQueen', 'Lars von Trier', 'Pier Paolo Pasolini', 'Xavier Gens', 'Jennifer Lynch', 'Oliver Stone', 'Kevin Smith', 'Abel Ferrara', 'John Gulager', 'Russ Meyer', 'Larry Clark', 'Paul Haggis', 'Bernardo Bertolucci', 'John Waters', 'Ang Lee', 'Todd Solondz', 'Trey Parker', 'John Waters', 'David Mackenzie', 'Ken Russell', 'Christophe Honoré', 'Pedro Almodóvar']
Checking the number of elements in the 'nc17_director' list.
len(nc17_director)
35
These are the writers of the new 'NC-17 rated' movies for the 'Writer' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_writer = ['Joe Eszterhas', 'Gilbert Adair', 'Abi Morgan', 'Ghalia Lacroix', 'Cami Delavigne',
'James Toback', 'Tine Byrckel', 'Tim Fehlbaum', 'Tracy Letts', 'Hui-Ling Wang',
'Maren Louise Käehne', 'Sam Raimi', 'André Bonzel', 'Abi Morgan', 'Lars von Trier',
'Dacia Maraini', 'Xavier Gens', 'Jennifer Lynch', 'Oliver Stone', 'Kevin Smith',
'Abel Ferrara', 'Patrick Melton', 'Roger Ebert', 'Harmony Korine', 'Paul Haggis',
'Franco Arcalli', 'John Waters', 'Hui-Ling Wang', 'Todd Solondz', 'Trey Parker',
'John Waters', ' David Mackenzie', 'Deborah Dalton', 'Christophe Honoré',
'Pedro Almodóvar']
print(nc17_writer) #showing the nc17_writer list
['Joe Eszterhas', 'Gilbert Adair', 'Abi Morgan', 'Ghalia Lacroix', 'Cami Delavigne', 'James Toback', 'Tine Byrckel', 'Tim Fehlbaum', 'Tracy Letts', 'Hui-Ling Wang', 'Maren Louise Käehne', 'Sam Raimi', 'André Bonzel', 'Abi Morgan', 'Lars von Trier', 'Dacia Maraini', 'Xavier Gens', 'Jennifer Lynch', 'Oliver Stone', 'Kevin Smith', 'Abel Ferrara', 'Patrick Melton', 'Roger Ebert', 'Harmony Korine', 'Paul Haggis', 'Franco Arcalli', 'John Waters', 'Hui-Ling Wang', 'Todd Solondz', 'Trey Parker', 'John Waters', '\tDavid Mackenzie', 'Deborah Dalton', 'Christophe Honoré', 'Pedro Almodóvar']
Checking the number of elements in the 'nc17_writer' list.
len(nc17_writer)
35
These are the release date of the new 'NC-17 rated' movies for the 'Release_Date' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_date = ['September 22, 1995', 'February 6, 2004', 'December 2, 2011', 'October 25, 2013',
'December 29, 2010', 'September 9, 1997', 'April 27, 2012', '22 September 2011',
'July 27, 2012', 'September 28, 2007', 'November 1, 2019', 'October 15, 1981',
'15 January 1993', 'December 2, 2011', 'December 25, 2013', 'July 27, 1980', 'May 9, 2008',
'August 5, 2012', 'August 26, 1994', 'October 19, 1994', 'November 20, 1992 ',
'September 22, 2006', 'June 17, 1970', 'July 28, 1995', 'May 6, 2005', 'January 27, 1973',
'March 17, 1972', 'September 28, 2007', 'October 16, 1998', 'October 23, 1998',
'September 24, 2004 ', 'April 16, 2004', 'October 4, 1991', 'May 13, 2005 ',
'April 3, 1987']
print(nc17_date) #showing the nc17_date list
['September 22, 1995', 'February 6, 2004', 'December 2, 2011', 'October 25, 2013', 'December 29, 2010', 'September 9, 1997', 'April 27, 2012', '22 September 2011', 'July 27, 2012', 'September 28, 2007', 'November 1, 2019', 'October 15, 1981', '15 January 1993', 'December 2, 2011', 'December 25, 2013', 'July 27, 1980', 'May 9, 2008', 'August 5, 2012', 'August 26, 1994', 'October 19, 1994', 'November 20, 1992 ', 'September 22, 2006', 'June 17, 1970', 'July 28, 1995', 'May 6, 2005', 'January 27, 1973', 'March 17, 1972', 'September 28, 2007', 'October 16, 1998', 'October 23, 1998', 'September 24, 2004 ', 'April 16, 2004', 'October 4, 1991', 'May 13, 2005 ', 'April 3, 1987']
Checking the number of elements in the 'nc17_date' list.
len(nc17_date)
35
These are the names of the starring actors of the new 'NC-17 rated' movies for the 'Star' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_star = ['Elizabeth Berkley', 'Eva Green', 'Michael Fassbender', 'Léa Seydoux', 'Ryan Gosling',
'Robert Downey Jr.', 'Juliette Binoche', 'Lisa Vicari', 'Juno Temple', 'Tony Leung Chiu-Wai',
'Trine Dyrholm', 'Bruce Campbell', 'Benoît Poelvoorde', 'Michael Fassbender',
'Charlotte Gainsbourg', 'Ninetto Davoli', 'Karina Testa', 'Vincent D\'Onofrio',
'Woody Harrelson', 'Kevin Smith', 'Harvey Keitel', 'Clu Gulager', 'Marcia McBroom',
'Leo Fitzpatrick', 'Sandra Bullock', 'Marlon Brando', 'David Lochary',
'Tony Leung Chiu-Wai', 'Elizabeth Ashley', 'Michael Dean Jacobs', 'Suzanne Shepherd',
'Tilda Swinton', 'Theresa Russell', 'Louis Garrel', 'Antonio Banderas']
print(nc17_star) #showing the nc17_star list
['Elizabeth Berkley', 'Eva Green', 'Michael Fassbender', 'Léa Seydoux', 'Ryan Gosling', 'Robert Downey Jr.', 'Juliette Binoche', 'Lisa Vicari', 'Juno Temple', 'Tony Leung Chiu-Wai', 'Trine Dyrholm', 'Bruce Campbell', 'Benoît Poelvoorde', 'Michael Fassbender', 'Charlotte Gainsbourg', 'Ninetto Davoli', 'Karina Testa', "Vincent D'Onofrio", 'Woody Harrelson', 'Kevin Smith', 'Harvey Keitel', 'Clu Gulager', 'Marcia McBroom', 'Leo Fitzpatrick', 'Sandra Bullock', 'Marlon Brando', 'David Lochary', 'Tony Leung Chiu-Wai', 'Elizabeth Ashley', 'Michael Dean Jacobs', 'Suzanne Shepherd', 'Tilda Swinton', 'Theresa Russell', 'Louis Garrel', 'Antonio Banderas']
Checking the number of elements in the 'nc17_star' list.
len(nc17_star)
35
These are the names of the production company of the new 'NC-17 rated' movies for the 'Company' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_company = ['Carolco Pictures', 'Recorded Picture Company', 'Film4', 'Wild Bunch', 'Hunting Lane Films',
'Edward R. Pressman', 'Slot Machine', 'Caligari Film', 'Voltage Pictures',
'River Road Entertainment', 'Nordisk Film', 'Renaissance Pictures',
'Les Artistes Anonymes', 'Film4', 'Zentropa Entertainments', ' United Artists',
'BR Films', 'Anchor Bay Entertainment', 'Regency Enterprises',
'View Askew Productions', 'Aries Films', 'LivePlanet', '20th Century Fox',
'Independent Pictures', 'Bob Yari Productions', 'Produzioni Europee', 'Dreamland',
'River Road Entertainment', 'Killer Films','Avenging Conscience', 'Killer Films',
'Recorded Picture Company', 'Cheap Date', 'Gemini Films', 'El Deseo']
print(nc17_company) #showing the nc17_company list
['Carolco Pictures', 'Recorded Picture Company', 'Film4', 'Wild Bunch', 'Hunting Lane Films', 'Edward R. Pressman', 'Slot Machine', 'Caligari Film', 'Voltage Pictures', 'River Road Entertainment', 'Nordisk Film', 'Renaissance Pictures', 'Les Artistes Anonymes', 'Film4', 'Zentropa Entertainments', '\tUnited Artists', 'BR Films', 'Anchor Bay Entertainment', 'Regency Enterprises', 'View Askew Productions', 'Aries Films', 'LivePlanet', '20th Century Fox', 'Independent Pictures', 'Bob Yari Productions', 'Produzioni Europee', 'Dreamland', 'River Road Entertainment', 'Killer Films', 'Avenging Conscience', 'Killer Films', 'Recorded Picture Company', 'Cheap Date', 'Gemini Films', 'El Deseo']
Checking the number of elements in the 'nc17_company' list.
len(nc17_company)
35
These are the production budget of the new 'NC-17 rated' movies for the 'Production_Budget' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_budget = [45000000, 15000000, 6500000, 4074940, 1000000, 1000000, 3565572, 12000000, 10000000,
15000000, 19000000, 350000, 1000000, 6500000, 4700000, 904765, 3000000,
700000, 34000000, 230000, 1000000, 3200000, 1000000, 1500000, 6500000,
1250000, 12000, 15000000, 2200000, 1300000, 15000000, 6400000, 50000, 3259572,
612072]
print(nc17_budget) #showing the nc17_budget list
[45000000, 15000000, 6500000, 4074940, 1000000, 1000000, 3565572, 12000000, 10000000, 15000000, 19000000, 350000, 1000000, 6500000, 4700000, 904765, 3000000, 700000, 34000000, 230000, 1000000, 3200000, 1000000, 1500000, 6500000, 1250000, 12000, 15000000, 2200000, 1300000, 15000000, 6400000, 50000, 3259572, 612072]
Checking the number of elements in the 'nc17_budget' list.
len(nc17_budget)
35
These are the domestic gross of the new 'NC-17 rated' movies for the 'Domestic_Gross' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_domestic = [20350754, 2531462, 4002293, 2199787, 9737892, 2057193, 157508, 169705587, 1291645,
4604982, 0, 2400000, 0, 4002293, 785896, 0, 97182, 0, 50282766, 3073428, 2000022,
56131, 0, 7412216, 55334418, 36144824, 0, 4604982, 2746453, 582024, 1339668, 767373,
0, 71616, 0 ]
print(nc17_domestic) #showing the nc17_domestic list
[20350754, 2531462, 4002293, 2199787, 9737892, 2057193, 157508, 169705587, 1291645, 4604982, 0, 2400000, 0, 4002293, 785896, 0, 97182, 0, 50282766, 3073428, 2000022, 56131, 0, 7412216, 55334418, 36144824, 0, 4604982, 2746453, 582024, 1339668, 767373, 0, 71616, 0]
Checking the number of elements in the 'nc17_domestic' list.
len(nc17_domestic)
35
These are the foreign gross of the new 'NC-17 rated' movies for the 'Foreign_Gross' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_foreign = [17400000, 12775651, 16410548, 17266048, 6828348, 257833, 3664733, 43414417, 3367465,
60562448, 0, 261944, 0, 16410548, 1308406, 0, 2686353, 0, 797, 820812, 38894, 634741, 0,
13000000, 45838620, 2887, 0, 60562448, 3000000, 45263, 574498, 1794447, 0, 950532, 0]
print(nc17_foreign) #showing the nc17_foreign list
[17400000, 12775651, 16410548, 17266048, 6828348, 257833, 3664733, 43414417, 3367465, 60562448, 0, 261944, 0, 16410548, 1308406, 0, 2686353, 0, 797, 820812, 38894, 634741, 0, 13000000, 45838620, 2887, 0, 60562448, 3000000, 45263, 574498, 1794447, 0, 950532, 0]
Checking the number of elements in the 'nc17_foreign' list.
len(nc17_foreign)
35
These are the worldwide gross of the new 'NC-17 rated' movies for the 'Worldwide_Gross' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_worldwide = [37750754, 15307113, 20412841, 19465835, 16566240, 2315026, 3822241, 213120004, 4659110,
65167430, 1236844, 2661944, 205569, 20412841, 2094302, 3453416, 2783535, 103093,
50283563, 3894240, 2038916, 690872, 9000000, 20412216, 101173038, 36147711 , 413802,
65167430, 5746453, 627287, 1914166, 2561820, 1008404, 1022148, 1470809]
print(nc17_worldwide) #showing the nc17_worldwide list
[37750754, 15307113, 20412841, 19465835, 16566240, 2315026, 3822241, 213120004, 4659110, 65167430, 1236844, 2661944, 205569, 20412841, 2094302, 3453416, 2783535, 103093, 50283563, 3894240, 2038916, 690872, 9000000, 20412216, 101173038, 36147711, 413802, 65167430, 5746453, 627287, 1914166, 2561820, 1008404, 1022148, 1470809]
Checking the number of elements in the 'nc17_worldwide' list.
len(nc17_worldwide)
35
These are the runtime of the new 'NC-17 rated' movies for the 'Runtime' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_runtime = [97, 130, 101, 180, 120, 84, 99, 89, 102, 158, 187, 85, 96, 101, 145, 155, 108, 94, 119,
62, 96, 95, 109, 91, 112, 129, 92, 158, 134, 95, 84, 98, 80, 110, 82]
print(nc17_runtime) #showing the nc17_runtime list
[97, 130, 101, 180, 120, 84, 99, 89, 102, 158, 187, 85, 96, 101, 145, 155, 108, 94, 119, 62, 96, 95, 109, 91, 112, 129, 92, 158, 134, 95, 84, 98, 80, 110, 82]
Checking the number of elements in the 'nc17_runtime' list.
len(nc17_runtime)
35
These are the rating of the new 'NC-17 rated' movies for the 'Averagerating' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_rating = [4.9, 7.1, 7.2, 7.7, 7.4, 5.5, 5.6, 5.9, 6.7, 7.5, 7.1, 7.4, 7.4, 7.2, 6.9, 6.7, 6.2, 6.4,
7.2, 7.7, 7.0, 6.2, 6.1, 7.0, 7.8, 6.9, 6.0, 7.5, 7.7, 6.1, 5.1, 6.4, 5.5, 5.0, 7.1]
print(nc17_rating) #showing the nc17_rating list
[4.9, 7.1, 7.2, 7.7, 7.4, 5.5, 5.6, 5.9, 6.7, 7.5, 7.1, 7.4, 7.4, 7.2, 6.9, 6.7, 6.2, 6.4, 7.2, 7.7, 7.0, 6.2, 6.1, 7.0, 7.8, 6.9, 6.0, 7.5, 7.7, 6.1, 5.1, 6.4, 5.5, 5.0, 7.1]
Checking the number of elements in the 'nc17_rating' list.
len(nc17_rating)
35
This is for the 'Genre' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_genre = []
for i in range(35):nc17_genre.append('Drama')
print(nc17_genre) #showing the nc17_genre list
['Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama', 'Drama']
Checking the number of elements in the 'nc17_genre' list.
len(nc17_genre)
35
This is for the 'Rating' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_rated = []
for i in range(35):nc17_rated.append('NC-17')
print(nc17_rated) #showing the nc17_rated list
['NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17']
Checking the number of elements in the 'nc17_rated' list.
len(nc17_rated)
35
Creating the Production_Budget_x column by turning the nc17_budget list into currency. For the nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_budget_x = []
for i in nc17_budget:
nc17_budget_x.append("${:,.0f}".format(i))
print(nc17_budget_x) #showing the nc17_budget_x list
['$45,000,000', '$15,000,000', '$6,500,000', '$4,074,940', '$1,000,000', '$1,000,000', '$3,565,572', '$12,000,000', '$10,000,000', '$15,000,000', '$19,000,000', '$350,000', '$1,000,000', '$6,500,000', '$4,700,000', '$904,765', '$3,000,000', '$700,000', '$34,000,000', '$230,000', '$1,000,000', '$3,200,000', '$1,000,000', '$1,500,000', '$6,500,000', '$1,250,000', '$12,000', '$15,000,000', '$2,200,000', '$1,300,000', '$15,000,000', '$6,400,000', '$50,000', '$3,259,572', '$612,072']
Checking the number of elements in the 'nc17_budget_x' list.
len(nc17_budget_x)
35
Creating the Domestic_Gross_x column by turning the nc17_domestic list into currency. For the nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_domestic_x = []
for i in nc17_domestic:
nc17_domestic_x.append("${:,.0f}".format(i))
print(nc17_domestic_x) #showing the nc17_domestic_x list
['$20,350,754', '$2,531,462', '$4,002,293', '$2,199,787', '$9,737,892', '$2,057,193', '$157,508', '$169,705,587', '$1,291,645', '$4,604,982', '$0', '$2,400,000', '$0', '$4,002,293', '$785,896', '$0', '$97,182', '$0', '$50,282,766', '$3,073,428', '$2,000,022', '$56,131', '$0', '$7,412,216', '$55,334,418', '$36,144,824', '$0', '$4,604,982', '$2,746,453', '$582,024', '$1,339,668', '$767,373', '$0', '$71,616', '$0']
Checking the number of elements in the 'nc17_domestic_x' list.
len(nc17_domestic_x)
35
Creating the Foreign_Gross_x column by turning the nc17_foreign list into currency. For the nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_foreign_x = []
for i in nc17_foreign:
nc17_foreign_x.append("${:,.0f}".format(i))
print(nc17_foreign_x) #showing the nc17_foreign_x list
['$17,400,000', '$12,775,651', '$16,410,548', '$17,266,048', '$6,828,348', '$257,833', '$3,664,733', '$43,414,417', '$3,367,465', '$60,562,448', '$0', '$261,944', '$0', '$16,410,548', '$1,308,406', '$0', '$2,686,353', '$0', '$797', '$820,812', '$38,894', '$634,741', '$0', '$13,000,000', '$45,838,620', '$2,887', '$0', '$60,562,448', '$3,000,000', '$45,263', '$574,498', '$1,794,447', '$0', '$950,532', '$0']
Checking the number of elements in the 'nc17_foreign_x' list.
len(nc17_foreign_x)
35
Creating the Worldwide_Gross_x column by turning the nc17_worldwide list into currency. For the nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_worldwide_x = []
for i in nc17_worldwide:
nc17_worldwide_x.append("${:,.0f}".format(i))
print(nc17_worldwide_x) #showing the nc17_worldwide_x list
['$37,750,754', '$15,307,113', '$20,412,841', '$19,465,835', '$16,566,240', '$2,315,026', '$3,822,241', '$213,120,004', '$4,659,110', '$65,167,430', '$1,236,844', '$2,661,944', '$205,569', '$20,412,841', '$2,094,302', '$3,453,416', '$2,783,535', '$103,093', '$50,283,563', '$3,894,240', '$2,038,916', '$690,872', '$9,000,000', '$20,412,216', '$101,173,038', '$36,147,711', '$413,802', '$65,167,430', '$5,746,453', '$627,287', '$1,914,166', '$2,561,820', '$1,008,404', '$1,022,148', '$1,470,809']
Checking the number of elements in the 'nc17_worldwide_x' list.
len(nc17_worldwide_x)
35
These are the Profit of the new 'G-rated' movies for the 'Profit' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame. This was calculated by subtracting the Budget of each movie from the Worldwide Gross.
nc17_profit = []
for x,y in enumerate(nc17_worldwide):
nc17_profit.append(y-nc17_budget[x])
print(nc17_profit) #showing the nc17_profit list
[-7249246, 307113, 13912841, 15390895, 15566240, 1315026, 256669, 201120004, -5340890, 50167430, -17763156, 2311944, -794431, 13912841, -2605698, 2548651, -216465, -596907, 16283563, 3664240, 1038916, -2509128, 8000000, 18912216, 94673038, 34897711, 401802, 50167430, 3546453, -672713, -13085834, -3838180, 958404, -2237424, 858737]
Checking the number of elements in the 'nc17_profit' list.
len(nc17_profit)
35
Creating the Profit_x column by turning the nc17_profit list into currency. For the nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_profit_x = []
for i in nc17_profit:
nc17_profit_x.append("${:,.0f}".format(i))
print(nc17_profit_x) #showing the nc17_profit_x list
['$-7,249,246', '$307,113', '$13,912,841', '$15,390,895', '$15,566,240', '$1,315,026', '$256,669', '$201,120,004', '$-5,340,890', '$50,167,430', '$-17,763,156', '$2,311,944', '$-794,431', '$13,912,841', '$-2,605,698', '$2,548,651', '$-216,465', '$-596,907', '$16,283,563', '$3,664,240', '$1,038,916', '$-2,509,128', '$8,000,000', '$18,912,216', '$94,673,038', '$34,897,711', '$401,802', '$50,167,430', '$3,546,453', '$-672,713', '$-13,085,834', '$-3,838,180', '$958,404', '$-2,237,424', '$858,737']
Checking the number of elements in the 'nc17_profit_x' list.
len(nc17_profit_x)
35
These are the number of Tickets sold of the new 'NC-17 rated' movies for the 'Tickets' column in the new nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_tickets = []
for i in nc17_worldwide:
nc17_tickets.append(round(i/10))
print(nc17_tickets) #showing the nc17_tickets list
[3775075, 1530711, 2041284, 1946584, 1656624, 231503, 382224, 21312000, 465911, 6516743, 123684, 266194, 20557, 2041284, 209430, 345342, 278354, 10309, 5028356, 389424, 203892, 69087, 900000, 2041222, 10117304, 3614771, 41380, 6516743, 574645, 62729, 191417, 256182, 100840, 102215, 147081]
Checking the number of elements in the 'nc17_tickets' list.
len(nc17_tickets)
35
Creating the Tickets_x column by turning the nc17_tickets list into string. For the nc17_dataframe dataframe that will be appended to the Drama_DataFrame.
nc17_tickets_x = []
for i in nc17_tickets:
nc17_tickets_x.append("{:,.0f}".format(i))
print(nc17_tickets_x) #showing the nc17_tickets_x list
['3,775,075', '1,530,711', '2,041,284', '1,946,584', '1,656,624', '231,503', '382,224', '21,312,000', '465,911', '6,516,743', '123,684', '266,194', '20,557', '2,041,284', '209,430', '345,342', '278,354', '10,309', '5,028,356', '389,424', '203,892', '69,087', '900,000', '2,041,222', '10,117,304', '3,614,771', '41,380', '6,516,743', '574,645', '62,729', '191,417', '256,182', '100,840', '102,215', '147,081']
Checking the number of elements in the 'nc17_tickets_x' list.
len(nc17_tickets_x)
35
Creating the nc17_dataframe dataframe.
nc17_dataframe = pd.DataFrame({"Movie":nc17_name, "Release_Date":nc17_date, "Genre":nc17_genre,
"Rating":nc17_rated,
"Production_Budget":nc17_budget, "Production_Budget_x":nc17_budget_x,
"Domestic_Gross":nc17_domestic, "Domestic_Gross_x":nc17_domestic_x,
"Foreign_Gross":nc17_foreign, "Foreign_Gross_x":nc17_foreign_x,
"Worldwide_Gross":nc17_worldwide, "Worldwide_Gross_x":nc17_worldwide_x,
"Profit":nc17_profit, "Profit_x":nc17_profit_x, "Tickets":nc17_tickets,
"Tickets_x":nc17_tickets_x, "Runtime":nc17_runtime, "Averagerating":nc17_rating,
"Company":nc17_company, "Star":nc17_star, "Director":nc17_director, "Writer":nc17_writer
})
The first five columns of the nc17_dataframe dataframe.
nc17_dataframe.head()
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | ... | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Showgirls | September 22, 1995 | Drama | NC-17 | 45000000 | $45,000,000 | 20350754 | $20,350,754 | 17400000 | $17,400,000 | ... | -7249246 | $-7,249,246 | 3775075 | 3,775,075 | 97 | 4.9 | Carolco Pictures | Elizabeth Berkley | Paul Verhoeven | Joe Eszterhas |
| 1 | The Dreamers | February 6, 2004 | Drama | NC-17 | 15000000 | $15,000,000 | 2531462 | $2,531,462 | 12775651 | $12,775,651 | ... | 307113 | $307,113 | 1530711 | 1,530,711 | 130 | 7.1 | Recorded Picture Company | Eva Green | Bernardo Bertolucci | Gilbert Adair |
| 2 | Shame | December 2, 2011 | Drama | NC-17 | 6500000 | $6,500,000 | 4002293 | $4,002,293 | 16410548 | $16,410,548 | ... | 13912841 | $13,912,841 | 2041284 | 2,041,284 | 101 | 7.2 | Film4 | Michael Fassbender | Steve McQueen | Abi Morgan |
| 3 | Blue Is the Warmest Colour | October 25, 2013 | Drama | NC-17 | 4074940 | $4,074,940 | 2199787 | $2,199,787 | 17266048 | $17,266,048 | ... | 15390895 | $15,390,895 | 1946584 | 1,946,584 | 180 | 7.7 | Wild Bunch | Léa Seydoux | Abdellatif Kechiche | Ghalia Lacroix |
| 4 | Blue Valentine | December 29, 2010 | Drama | NC-17 | 1000000 | $1,000,000 | 9737892 | $9,737,892 | 6828348 | $6,828,348 | ... | 15566240 | $15,566,240 | 1656624 | 1,656,624 | 120 | 7.4 | Hunting Lane Films | Ryan Gosling | Derek Cianfrance | Cami Delavigne |
5 rows × 22 columns
Appending the nc17_dataframe dataframe to the Drama_DataFrame.
Drama_DataFrame = Drama_DataFrame.append(nc17_dataframe, ignore_index=True)
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\1741375482.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. Drama_DataFrame = Drama_DataFrame.append(nc17_dataframe, ignore_index=True)
NC-17 genre now has more movies added to it.
grouped2 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Genre[i] == 'Drama':grouped2.append(Drama_DataFrame.Rating[i])
grouped2 = Counter(grouped2)
grouped2
Counter({'PG': 67, 'R': 77, 'PG-13': 76, 'Not Rated': 3, 'NC-17': 49, 'G': 34})
Some of the Worldwide Gross are in string instead of int, this code will change it back to int.
new = []
for i in Drama_DataFrame.Worldwide_Gross:
if isinstance(i, str):new.append(int(i.replace("$","").replace(",","")))
else:new.append(i)
print(new) #showing the new list
[180047784, 142634358, 693698673, 449948323, 634454789, 54462971, 368567189, 137551594, 47818913, 84154026, 381398492, 371350619, 90552675, 74966854, 134612435, 213591522, 179748880, 108660270, 41642166, 26387039, 71004627, 203127894, 48478084, 570998101, 162498338, 169590606, 16340767, 31124367, 116809717, 50647416, 173567581, 96068724, 97143987, 85309093, 252276928, 61721826, 24687524, 63802928, 165552290, 160558438, 15826984, 197618160, 68984536, 16481405, 94050951, 15815509, 41059418, 213120004, 142033509, 96633833, 66540205, 29847480, 82917283, 6792768, 64282881, 77735925, 32398681, 31054727, 4065020, 38017873, 5046038, 304604712, 92678948, 208265198, 46604054, 22281732, 28270399, 3727746, 76086711, 14189810, 7680250, 7719630, 334522294, 38028230, 52545707, 56506120, 8217571, 7585011, 128955898, 20601987, 59168692, 11173718, 528731, 34044909, 331266710, 38358392, 33069303, 36262783, 11831131, 32909437, 19859167, 23477345, 35830713, 12034913, 42843521, 78356170, 62076141, 56178935, 70133905, 61603136, 31556959, 2179623, 36787044, 21817298, 81831866, 21971021, 77733867, 50827466, 31187727, 36964656, 10765283, 382946, 20412841, 16369708, 148806510, 41699612, 17499242, 18945682, 2821010, 6205034, 17536004, 4972016, 1027760, 1200000, 57273049, 679482, 40454520, 20433227, 38969037, 73975239, 23251930, 15298355, 35185884, 16610760, 16131551, 11295324, 10153415, 2088390, 7482387, 6328516, 21270290, 12231500, 14244931, 5552584, 16566240, 5438911, 1156309, 852399, 354836, 3728400, 62375, 2102779, 429448, 2769782, 9709597, 46918287, 542351353, 73986904, 305937718, 216601214, 38102988, 27118000, 534816, 37306334, 47494916, 19344615, 38741732, 114830111, 43545364, 18948425, 3438735, 137587063, 64605762, 33473297, 89137047, 8526288, 64667874, 106269971, 35656130, 3987768, 7025496, 152036382, 171120329, 13835130, 14859394, 134582776, 6101815, 63954968, 10769960, 32255440, 15164458, 127956187, 2819485, 43440294, 17815212, 157297525, 35856053, 119285432, 40716963, 14920781, 3281232, 14923752, 125052686, 549368315, 6668025, 199078, 64892670, 4786789, 8443124, 2044892, 2400000, 1705908, 80008942, 48000000, 17356268, 1008404, 277845, 1614784, 20412216, 20350754, 98410061, 496059, 15121165, 1022148, 67091915, 20412841, 19465835, 195494, 2411143, 1025228, 18587135, 8721243, 40300, 10015449, 80693537, 54766923, 77211836, 34718173, 1951683, 636796, 2447576, 9171289, 3256082, 13000000, 11000000, 438656843, 66947950, 27469621, 37799643, 325500000, 246100000, 3750000, 69131860, 4517000, 143985708, 10015449, 17657973, 80491516, 311281000, 286214195, 90482317, 986214868, 268000000, 72071636, 108998, 47707417, 30194409, 65500000, 7600377, 12000000, 592861, 37750754, 15307113, 20412841, 19465835, 16566240, 2315026, 3822241, 213120004, 4659110, 65167430, 1236844, 2661944, 205569, 20412841, 2094302, 3453416, 2783535, 103093, 50283563, 3894240, 2038916, 690872, 9000000, 20412216, 101173038, 36147711, 413802, 65167430, 5746453, 627287, 1914166, 2561820, 1008404, 1022148, 1470809]
Checking the number of elements in the 'new' list.
len(new)
306
Now that they are all integers. The dataframe will be updating the Worldwide_Gross colunm with the new and imporved values.
Drama_DataFrame.Worldwide_Gross = new
Some of the Worldwide_Gross_x are in int instead of in dollars, this code will change it to dollars.
new1 = []
for i in Drama_DataFrame.Worldwide_Gross_x:
if isinstance(i, int):new1.append('${:,.0f}'.format(i))
else:new1.append(i)
print(new1) #showing the new1 list
['$180,047,784', '$142,634,358', '$693,698,673', '$449,948,323', '$634,454,789', '$54,462,971', '$368,567,189', '$137,551,594', '$47,818,913', '$84,154,026', '$381,398,492', '$371,350,619', '$90,552,675', '$74,966,854', '$134,612,435', '$213,591,522', '$179,748,880', '$108,660,270', '$41,642,166', '$26,387,039', '$71,004,627', '$203,127,894', '$48,478,084', '$570,998,101', '$162,498,338', '$169,590,606', '$16,340,767', '$31,124,367', '$116,809,717', '$50,647,416', '$173,567,581', '$96,068,724', '$97,143,987', '$85,309,093', '$252,276,928', '$61,721,826', '$24,687,524', '$63,802,928', '$165,552,290', '$160,558,438', '$15,826,984', '$197,618,160', '$68,984,536', '$16,481,405', '$94,050,951', '$15,815,509', '$41,059,418', '$213,120,004', '$142,033,509', '$96,633,833', '$66,540,205', '$29,847,480', '$82,917,283', '$6,792,768', '$64,282,881', '$77,735,925', '$32,398,681', '$31,054,727', '$4,065,020', '$38,017,873', '$5,046,038', '$304,604,712', '$92,678,948', '$208,265,198', '$46,604,054', '$22,281,732', '$28,270,399', '$3,727,746', '$76,086,711', '$14,189,810', '$7,680,250', '$7,719,630', '$334,522,294', '$38,028,230', '$52,545,707', '$56,506,120', '$8,217,571', '$7,585,011', '$128,955,898', '$20,601,987', '$59,168,692', '$11,173,718', '$528,731', '$34,044,909', '$331,266,710', '$38,358,392', '$33,069,303', '$36,262,783', '$11,831,131', '$32,909,437', '$19,859,167', '$23,477,345', '$35,830,713', '$12,034,913', '$42,843,521', '$78,356,170', '$62,076,141', '$56,178,935', '$70,133,905', '$61,603,136', '$31,556,959', '$2,179,623', '$36,787,044', '$21,817,298', '$81,831,866', '$21,971,021', '$77,733,867', '$50,827,466', '$31,187,727', '$36,964,656', '$10,765,283', '$382,946', '$20,412,841', '$16,369,708', '$148,806,510', '$41,699,612', '$17,499,242', '$18,945,682', '$2,821,010', '$6,205,034', '$17,536,004', '$4,972,016', '$1,027,760', '$1,200,000', '$57,273,049', '$679,482', '$40,454,520', '$20,433,227', '$38,969,037', '$73,975,239', '$23,251,930', '$15,298,355', '$35,185,884', '$16,610,760', '$16,131,551', '$11,295,324', '$10,153,415', '$2,088,390', '$7,482,387', '$6,328,516', '$21,270,290', '$12,231,500', '$14,244,931', '$5,552,584', '$16,566,240', '$5,438,911', '$1,156,309', '$852,399', '$354,836', '$3,728,400', '$62,375', '$2,102,779', '$429,448', '$2,769,782', '$9,709,597', '$46,918,287', '$542,351,353', '$73,986,904', '$305,937,718', '$216,601,214', '$38,102,988', '$27,118,000', '$534,816', '$37,306,334', '$47,494,916', '$19,344,615', '$38,741,732', '$114,830,111', '$43,545,364', '$18,948,425', '$3,438,735', '$137,587,063', '$64,605,762', '$33,473,297', '$89,137,047', '$8,526,288', '$64,667,874', '$106,269,971', '$35,656,130', '$3,987,768', '$7,025,496', '$152,036,382', '$171,120,329', '$13,835,130', '$14,859,394', '$134,582,776', '$6,101,815', '$63,954,968', '$10,769,960', '$32,255,440', '$15,164,458', '$127,956,187', '$2,819,485', '$43,440,294', '$17,815,212', '$157,297,525', '$35,856,053', '$119,285,432', '$40,716,963', '$14,920,781', '$3,281,232', '$14,923,752', '$125,052,686', '$549,368,315', '$6,668,025', '$199,078', '$64,892,670', '$4,786,789', '$8,443,124', '$2,044,892', '$2,400,000', '$1,705,908', '$80,008,942', '$48,000,000', '$17,356,268', '$1,008,404', '$277,845', '$1,614,784', '$20,412,216', '$20,350,754', '$98,410,061', '$496,059', '$15,121,165', '$1,022,148', '$67,091,915', '$20,412,841', '$19,465,835', '$195,494', '$2,411,143', '$1,025,228', '$18,587,135', '$8,721,243', '$40,300', '$10,015,449', '$80,693,537', '$54,766,923', '$77,211,836', '$34,718,173', '$1,951,683', '$636,796', '$2,447,576', '$9,171,289', '$3,256,082', '$13,000,000', '$11,000,000', '$438,656,843', '$66,947,950', '$27,469,621', '$37,799,643', '$325,500,000', '$246,100,000', '$3,750,000', '$69,131,860', '$4,517,000', '$143,985,708', '$10,015,449', '$17,657,973', '$80,491,516', '$311,281,000', '$286,214,195', '$90,482,317', '$986,214,868', '$268,000,000', '$72,071,636', '$108,998', '$47,707,417', '$30,194,409', '$65,500,000', '$7,600,377', '$12,000,000', '$592,861', '$37,750,754', '$15,307,113', '$20,412,841', '$19,465,835', '$16,566,240', '$2,315,026', '$3,822,241', '$213,120,004', '$4,659,110', '$65,167,430', '$1,236,844', '$2,661,944', '$205,569', '$20,412,841', '$2,094,302', '$3,453,416', '$2,783,535', '$103,093', '$50,283,563', '$3,894,240', '$2,038,916', '$690,872', '$9,000,000', '$20,412,216', '$101,173,038', '$36,147,711', '$413,802', '$65,167,430', '$5,746,453', '$627,287', '$1,914,166', '$2,561,820', '$1,008,404', '$1,022,148', '$1,470,809']
Checking the number of elements in the 'new1' list.
len(new1)
306
Now that they are all strings. The dataframe will be updating the Worldwide_Gross_x colunm with the new and imporved values.
Drama_DataFrame.Worldwide_Gross_x = new1
This is the final product of all the editting and merging of other csv files and the merging of dataframes that were extra movies that were also 'R', 'G', 'NC-17' and 'PG' rated, due to a uneven distribution of the system ratings in the previous dataframe. Drama_DataFrame will be the dataframe used thoughout this analysis.
pd.set_option('display.max_columns', None)#showing all the columns
Drama_DataFrame
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | Worldwide_Gross | Worldwide_Gross_x | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Hugo | Nov 23, 2011 | Drama | PG | 180000000.0 | $180,000,000 | 73864507 | $73,864,507 | 111900000.0 | $111,900,000 | 180047784 | $180,047,784 | 47784.0 | $47,784 | 18004778 | 18,004,778 | 126.0 | 7.5 | Paramount Pictures | Asa Butterfield | Martin Scorsese | John Logan |
| 1 | The Wolfman | Feb 12, 2010 | Drama | R | 150000000.0 | $150,000,000 | 62189884 | $62,189,884 | 77800000.0 | $77,800,000 | 142634358 | $142,634,358 | -7365642.0 | $-7,365,642 | 14263436 | 14,263,436 | NaN | 5.8 | NaN | Benicio Del Toro | Joe Johnston | Andrew Kevin Walker |
| 2 | Gravity | Oct 4, 2013 | Drama | PG-13 | 110000000.0 | $110,000,000 | 274092705 | $274,092,705 | 449100000.0 | $449,100,000 | 693698673 | $693,698,673 | 583698673.0 | $583,698,673 | 69369867 | 69,369,867 | 91.0 | 7.7 | Warner Bros. | Sandra Bullock | Alfonso Cuarón | Alfonso Cuarón |
| 3 | Django Unchained | Dec 25, 2012 | Drama | R | 100000000.0 | $100,000,000 | 162805434 | $162,805,434 | 262600000.0 | $262,600,000 | 449948323 | $449,948,323 | 349948323.0 | $349,948,323 | 44994832 | 44,994,832 | 165.0 | 8.4 | The Weinstein Company | Jamie Foxx | Quentin Tarantino | Quentin Tarantino |
| 4 | Sing | Dec 21, 2016 | Drama | PG-13 | 75000000.0 | $75,000,000 | 270329045 | $270,329,045 | 363800000.0 | $363,800,000 | 634454789 | $634,454,789 | 559454789.0 | $559,454,789 | 63445479 | 63,445,479 | 98.0 | 7.1 | TriStar Pictures | Lorraine Bracco | Richard Baskin | Dean Pitchford |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 301 | A Dirty Shame | September 24, 2004 | Drama | NC-17 | 15000000.0 | $15,000,000 | 1339668 | $1,339,668 | 574498.0 | $574,498 | 1914166 | $1,914,166 | -13085834.0 | $-13,085,834 | 191417 | 191,417 | 84.0 | 5.1 | Killer Films | Suzanne Shepherd | John Waters | John Waters |
| 302 | Young Adam | April 16, 2004 | Drama | NC-17 | 6400000.0 | $6,400,000 | 767373 | $767,373 | 1794447.0 | $1,794,447 | 2561820 | $2,561,820 | -3838180.0 | $-3,838,180 | 256182 | 256,182 | 98.0 | 6.4 | Recorded Picture Company | Tilda Swinton | David Mackenzie | \tDavid Mackenzie |
| 303 | Whore 1991 | October 4, 1991 | Drama | NC-17 | 50000.0 | $50,000 | 0 | $0 | 0.0 | $0 | 1008404 | $1,008,404 | 958404.0 | $958,404 | 100840 | 100,840 | 80.0 | 5.5 | Cheap Date | Theresa Russell | Ken Russell | Deborah Dalton |
| 304 | Ma Mère | May 13, 2005 | Drama | NC-17 | 3259572.0 | $3,259,572 | 71616 | $71,616 | 950532.0 | $950,532 | 1022148 | $1,022,148 | -2237424.0 | $-2,237,424 | 102215 | 102,215 | 110.0 | 5.0 | Gemini Films | Louis Garrel | Christophe Honoré | Christophe Honoré |
| 305 | Law of Desire | April 3, 1987 | Drama | NC-17 | 612072.0 | $612,072 | 0 | $0 | 0.0 | $0 | 1470809 | $1,470,809 | 858737.0 | $858,737 | 147081 | 147,081 | 82.0 | 7.1 | El Deseo | Antonio Banderas | Pedro Almodóvar | Pedro Almodóvar |
306 rows × 22 columns
This is the blueprint for creating the first visualization, Return on Investment or ROI. Pandas DataFrame will be used to create these visualizations. The dataframe will use the Styler class for styling by passing style functions into Styler.apply or Styler.applymap. The styling will be performed after the data in the DataFrames has been processed. There will be five dataframes based ont he systerm ratings, 'PG', 'PG-13', 'R', 'NC-17' and 'G'. The Styler will be used to create an HTML and leverages CSS styling language to manipulate many parameters including colors, fonts, borders, background, etc.
The dataframe that will be used in these visualizations will be the Drama_DataFrame, this dataframe is movies based on the Drama genre and these are the names of the columns that Drama_DataFrame consist of;
|
|
There will be five styled dataframes and in each dataframes are key information of the movies, these datafrems will be categorized into system rating which are 'R', 'PG-13', 'PG', 'G' and 'NC-17'. These are the five columns that will be in every dataframe;
The dataframes has to have visual components that convey the information smoothly and also differentiate from one another due to that fact that they are all in groups, these are the formats that help do so;
Itables is a libary that is inatalled to allow all dataframes to be shown as interactive datatables.
from itables import init_notebook_mode
init_notebook_mode(all_interactive=True)
Drama_DataFrame is the dataframe that will be used throughout this analysis. (this dataframe is interactive)
Drama_DataFrame
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | Worldwide_Gross | Worldwide_Gross_x | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Loading... (need help?) |
Below will be the start of the creation of dataframes that are in the 'Drama Genre' that are 'R-rated' based on the 'ROI' of each movie.
Index of all the 'R' rated movies.
r_index = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'R':r_index.append(i)
print(r_index) #showing the r_index list
[1, 3, 5, 6, 9, 10, 11, 13, 14, 23, 29, 36, 39, 53, 55, 56, 57, 58, 59, 64, 66, 67, 71, 76, 77, 81, 82, 84, 85, 87, 88, 90, 92, 93, 94, 97, 98, 101, 103, 106, 110, 111, 116, 118, 120, 121, 124, 125, 126, 127, 128, 130, 133, 134, 135, 136, 137, 139, 140, 142, 144, 145, 146, 147, 150, 152, 153, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244]
Checking the number of elements in the 'r_index' list.
len(r_index)
77
Getiing the Profit for all 'R' rated movies.
r_profit = []
for i in r_index:
r_profit.append(Drama_DataFrame.Profit[i])
print(r_profit) #showing the r_profit list
[-7365642.0, 349948323.0, -13537029.0, 307567189.0, 24154026.0, 326398492.0, 316350619.0, 19966854.0, 82112435.0, 530998101.0, 13147416.0, -10312476.0, 129558438.0, -18207232.0, 54735925.0, 9898681.0, 8554727.0, -17934980.0, 17017873.0, 26604054.0, 8270399.0, -16272254.0, -10280370.0, -7782429.0, -8414989.0, -3826282.0, -14471269.0, 318266710.0, 25358392.0, 23262783.0, -1168869.0, 7859167.0, 23830713.0, 34913.0, 31043521.0, 45178935.0, 60133905.0, -7820377.0, 12417298.0, 69233867.0, 3765283.0, -6617054.0, 12499242.0, -2178990.0, 12636004.0, 222016.0, 53273049.0, -3320518.0, 36954520.0, 17033227.0, 35669037.0, 20251930.0, 14610760.0, 14131551.0, 9295324.0, 8153415.0, 88390.0, 4328516.0, 19282640.0, 12744931.0, 15566240.0, 4438911.0, 156309.0, -147601.0, -187625.0, 294448.0, 2669782.0, 48766923.0, 68711836.0, 14718173.0, 1851683.0, -25363204.0, -4052424.0, -12828711.0, 556082.0, 1500000.0, 2000000.0]
Checking the number of elements in the 'r_profit' list.
len(r_profit)
77
Getiing the Cost for all 'R' rated movies.
r_cost = []
for i in r_index:
r_cost.append(Drama_DataFrame.Production_Budget_x[i])
print(r_cost) #showing the r_cost list
['$150,000,000', '$100,000,000', '$68,000,000', '$61,000,000', '$60,000,000', '$55,000,000', '$55,000,000', '$55,000,000', '$52,500,000', '$40,000,000', '$37,500,000', '$35,000,000', '$31,000,000', '$25,000,000', '$23,000,000', '$22,500,000', '$22,500,000', '$22,000,000', '$21,000,000', '$20,000,000', '$20,000,000', '$20,000,000', '$18,000,000', '$16,000,000', '$16,000,000', '$15,000,000', '$15,000,000', '$13,000,000', '$13,000,000', '$13,000,000', '$13,000,000', '$12,000,000', '$12,000,000', '$12,000,000', '$11,800,000', '$11,000,000', '$10,000,000', '$10,000,000', '$9,400,000', '$8,500,000', '$7,000,000', '$7,000,000', '$5,000,000', '$5,000,000', '$4,900,000', '$4,750,000', '$4,000,000', '$4,000,000', '$3,500,000', '$3,400,000', '$3,300,000', '$3,000,000', '$2,000,000', '$2,000,000', '$2,000,000', '$2,000,000', '$2,000,000', '$2,000,000', '$1,987,650', '$1,500,000', '$1,000,000', '$1,000,000', '$1,000,000', '$1,000,000', '$250,000', '$135,000', '$100,000', '$6,000,000', '$8,500,000', '$20,000,000', '$100,000', '$26,000,000', '$6,500,000', '$22,000,000', '$2,700,000', '$11,500,000', '$9,000,000']
Checking the number of elements in the 'r_cost' list.
len(r_cost)
77
Getiing the Name for all 'R' rated movies.
r_name = []
for i in r_index:
r_name.append(Drama_DataFrame.Movie[i])
print(r_name) #showing the r_name list
['The Wolfman', 'Django Unchained', 'Downsizing', 'Gone Girl', 'Priest', 'Fifty Shades Darker', 'Fifty Shades Freed', 'Crimson Peak', 'Zero Dark Thirty', 'Fifty Shades of Grey', 'The Master', 'Biutiful', 'Flight', 'Tulip Fever', 'The Ides of March', 'Nocturnal Animals', 'The Water Diviner', 'Stone', 'For Colored Girls', 'The Debt', 'Let Me In', 'By the Sea', 'Miss Sloane', 'The Homesman', 'The Immigrant', 'Never Let Me Go', 'The Reluctant Fundamentalist', 'Black Swan', 'Ex Machina', 'Room', 'Chloe', 'If Beale Street Could Talk', 'Arbitrage', 'Stoker', 'Carol', 'Quartet', 'Hereditary', 'Coriolanus', 'Melancholia', 'Manchester by the Sea', 'We Need to Talk About Kevin', 'Hesher', 'Addicted', 'Everything Must Go', 'Mommy', 'Take Shelter', 'Boyhood', 'Stake Land', 'The Witch', 'Margin Call', 'Whiplash', 'Before Midnight', 'Silent House', "Winter's Bone", 'The Florida Project', 'We Are Your Friends', 'Locke', 'Knock Knock', 'Buried', 'Unsane', 'Blue Valentine', 'Martha Marcy May Marlene', 'Palo Alto', 'I Origins', 'The Canyons', 'Sound of My Voice', 'A Ghost Story', 'Ordinary People', 'Fame', 'Endless Love', 'Ghost Story', 'One from the Heart', 'The Hand', 'Pennies from Heaven', 'Zoot Suit', 'Rich and Famous', 'Raggedy Man']
Checking the number of elements in the 'r_name' list.
len(r_name)
77
Getiing the ROI for all 'R' rated movies.
r_return_on_investment = []
for i in r_index:
r_return_on_investment.append(Drama_DataFrame.Profit_x[i])
print(r_return_on_investment) #showing the r_return_on_investment list
['$-7,365,642', '$349,948,323', '$-13,537,029', '$307,567,189', '$24,154,026', '$326,398,492', '$316,350,619', '$19,966,854', '$82,112,435', '$530,998,101', '$13,147,416', '$-10,312,476', '$129,558,438', '$-18,207,232', '$54,735,925', '$9,898,681', '$8,554,727', '$-17,934,980', '$17,017,873', '$26,604,054', '$8,270,399', '$-16,272,254', '$-10,280,370', '$-7,782,429', '$-8,414,989', '$-3,826,282', '$-14,471,269', '$318,266,710', '$25,358,392', '$23,262,783', '$-1,168,869', '$7,859,167', '$23,830,713', '$34,913', '$31,043,521', '$45,178,935', '$60,133,905', '$-7,820,377', '$12,417,298', '$69,233,867', '$3,765,283', '$-6,617,054', '$12,499,242', '$-2,178,990', '$12,636,004', '$222,016', '$53,273,049', '$-3,320,518', '$36,954,520', '$17,033,227', '$35,669,037', '$20,251,930', '$14,610,760', '$14,131,551', '$9,295,324', '$8,153,415', '$88,390', '$4,328,516', '$19,282,640', '$12,744,931', '$15,566,240', '$4,438,911', '$156,309', '$-147,601', '$-187,625', '$294,448', '$2,669,782', '$48,766,923', '$68,711,836', '$14,718,173', '$1,851,683', '$-25,363,204', '$-4,052,424', '$-12,828,711', '$556,082', '$1,500,000', '$2,000,000']
Checking the number of elements in the 'r_return_on_investment' list.
len(r_return_on_investment)
77
Getiing the Ratings of all 'R' rated movies.
r_rating = []
for i in r_index:
r_rating.append(Drama_DataFrame.Averagerating[i])
print(r_rating) #showing the r_rating list
[5.8, 8.4, 5.7, 8.1, 5.7, 4.6, 4.5, 6.5, 7.4, 4.1, 7.1, 7.5, 7.3, 6.2, 7.1, 7.5, 7.1, 5.6, 6.1, 6.9, 7.1, 5.3, 7.5, 6.6, 6.6, 7.1, 6.9, 8.0, 7.7, 8.2, 6.9, 7.2, 6.6, 6.8, 7.2, 6.8, 7.3, 8.7, 7.2, 7.8, 7.5, 7.0, 5.2, 6.4, 8.1, 7.4, 7.9, 6.5, 6.8, 7.1, 8.5, 7.9, 5.3, 7.2, 7.6, 6.2, 7.1, 4.9, 7.0, 6.4, 7.4, 6.9, 6.2, 7.4, 3.8, 6.6, 6.8, 7.7, 6.6, 4.9, 6.3, 6.5, 5.5, 6.5, 6.8, 5.9, 6.8]
Checking the number of elements in the 'r_rating' list.
len(r_rating)
77
Getiing the Profit Percentage of all 'R' rated movies.
r_percent_profit = []
for i in r_index:
i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
r_percent_profit.append(int(round(i,0)))
print(r_percent_profit) #showing the r_percent_profit list
[-5, 350, -20, 504, 40, 593, 575, 36, 156, 1327, 35, -29, 418, -73, 238, 44, 38, -82, 81, 133, 41, -81, -57, -49, -53, -26, -96, 2448, 195, 179, -9, 65, 199, 0, 263, 411, 601, -78, 132, 815, 54, -95, 250, -44, 258, 5, 1332, -83, 1056, 501, 1081, 675, 731, 707, 465, 408, 4, 216, 970, 850, 1557, 444, 16, -15, -75, 218, 2670, 813, 808, 74, 1852, -98, -62, -58, 21, 13, 22]
Checking the number of elements in the 'r_percent_profit' list.
len(r_percent_profit)
77
Converting integer of the ROI values to percentage of all 'R' rated movies.
r_roi_percent = []
for i in r_percent_profit:
r_roi_percent.append("{:}%".format(i))
print(r_roi_percent) #showing the r_roi_percent list
['-5%', '350%', '-20%', '504%', '40%', '593%', '575%', '36%', '156%', '1327%', '35%', '-29%', '418%', '-73%', '238%', '44%', '38%', '-82%', '81%', '133%', '41%', '-81%', '-57%', '-49%', '-53%', '-26%', '-96%', '2448%', '195%', '179%', '-9%', '65%', '199%', '0%', '263%', '411%', '601%', '-78%', '132%', '815%', '54%', '-95%', '250%', '-44%', '258%', '5%', '1332%', '-83%', '1056%', '501%', '1081%', '675%', '731%', '707%', '465%', '408%', '4%', '216%', '970%', '850%', '1557%', '444%', '16%', '-15%', '-75%', '218%', '2670%', '813%', '808%', '74%', '1852%', '-98%', '-62%', '-58%', '21%', '13%', '22%']
Checking the number of elements in the 'r_roi_percent' list.
len(r_roi_percent)
77
Turning the integer of the star rating of each movie into a star of all 'R' rated movies.
r_stars = []
for i in r_rating:
r_stars.append('*'*int(i))
print(r_stars) #showing the r_stars list
['*****', '********', '*****', '********', '*****', '****', '****', '******', '*******', '****', '*******', '*******', '*******', '******', '*******', '*******', '*******', '*****', '******', '******', '*******', '*****', '*******', '******', '******', '*******', '******', '********', '*******', '********', '******', '*******', '******', '******', '*******', '******', '*******', '********', '*******', '*******', '*******', '*******', '*****', '******', '********', '*******', '*******', '******', '******', '*******', '********', '*******', '*****', '*******', '*******', '******', '*******', '****', '*******', '******', '*******', '******', '******', '*******', '***', '******', '******', '*******', '******', '****', '******', '******', '*****', '******', '******', '*****', '******']
Checking the number of elements in the 'r_stars' list.
len(r_stars)
77
Createing the 'R' rated dataframe with the variables previously created.
system_rating_r = pd.DataFrame({"Name of Movie":r_name, "Cost":r_cost,
"Return On Investment":r_return_on_investment,
"ROI Percentage":r_roi_percent,"Ratings":r_stars})
The 'system_rating_r' dataframe. (this dataframe is interactive)
system_rating_r
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
Getting the index of all the negative values.
neg_values = []
for i,x in enumerate(r_percent_profit):
if x <= 0: neg_values.append(i)
print(neg_values) #showing the neg_values list
[0, 2, 11, 13, 17, 21, 22, 23, 24, 25, 26, 30, 33, 37, 41, 43, 47, 63, 64, 71, 72, 73]
Checking the number of elements in the 'neg_values' list.
len(neg_values)
22
Dropping the negative values and resetting the index of the system_rating_r dataframe.
system_rating_r = system_rating_r.drop(labels=neg_values, axis=0)
system_rating_r = system_rating_r.reset_index(drop=True)
The new 'system_rating_r' dataframe. (this dataframe is interactive)
system_rating_r
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
Dividing the system_rating_r datafrme into three dataframes.
System_rating_r1 is the first dataframe. (this dataframe is interactive)
system_rating_r1=system_rating_r[:19]
system_rating_r1
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
System_rating_r2 is the second dataframe. (this dataframe is interactive)
system_rating_r2=system_rating_r[19:37]
system_rating_r2
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
System_rating_r3 is the third dataframe. (this dataframe is interactive)
system_rating_r3=system_rating_r[37:]
system_rating_r3
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
Getting the average Budget of all the 'R' rated movies in the Drama genre.
r_avg_value = sum([int(i.replace('$', '').replace(',', ''))
for i in system_rating_r['Cost']]) / len(system_rating_r['Cost'])
The average Budget of all the 'R' rated Drama movies is $16,455,866.
r_avg_value
16455866.363636363
Getting the index of all the movies that are above the average Budegt of all the 'R' rated Drama mvoies.
r_cost_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_r['Cost']]
#above ayg
r_below_avg1 = []
for i,x in enumerate(r_cost_index):
if x <= 16455866:r_below_avg1.append(i)
r_below_avg2 = []
for i,x in enumerate(r_cost_index):
if x >= 16455866:r_below_avg2.append(i)
The 'r_below_avg1' list.
print(r_below_avg1)
[16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54]
The 'r_below_avg2' list.
print(r_below_avg2)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 50]
Getting the average Return On Investment of all the 'R' rated movies in the Drama genre.
r_avg_value = sum([int(i.replace('$', '').replace(',', ''))
for i in system_rating_r['Return On Investment']]) / len(system_rating_r['Cost'])
The average Return On Investment of all the 'R' rated Drama movies is $59,600,710.
r_avg_value
59600710.27272727
Getting the index of all the movies that are below the average Return On Investment of all the 'R' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'R' rated Drama mvoies.
r_roi_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_r['Return On Investment']]
#below ayg
r_below_avg3 = []
for i,x in enumerate(r_roi_index):
if x <= 59600710:r_below_avg3.append(i)
r_below_avg4 = []
for i,x in enumerate(r_roi_index):
if x >= 59600710:r_below_avg4.append(i)
The 'r_below_avg3' list.
print(r_below_avg3)
[2, 5, 8, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 24, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53, 54]
The 'r_below_avg4' list.
print(r_below_avg4)
[0, 1, 3, 4, 6, 7, 9, 16, 23, 25, 49]
Getting the average Return On Investment Percentage of all the 'R' rated movies in the Drama genre.
r_avg_value = sum([int(i.replace('%', ''))
for i in system_rating_r['ROI Percentage']]) / len(system_rating_r['Cost'])
The average Return On Investment Percentage of all the 'R' rated Drama movies is 510%.
r_avg_value
508.8727272727273
Getting the index of all the movies that were below the average Return On Investment Percentage of all the 'R' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment Percentage of all the 'R' rated Drama mvoies.
roi_percent_index_r = [int(i.replace('%', ''))for i in system_rating_r['ROI Percentage']]
#above ayg
r_above_avg = []
for i,x in enumerate(roi_percent_index_r):
if x >= 508:r_above_avg.append(i)
The 'r_above_avg' list.
r_above_avg
[3, 4, 7, 16, 23, 25, 30, 31, 33, 34, 35, 36, 41, 42, 43, 47, 48, 49, 51]
After getting all the indexes of the movies that fit the quitria, they will then be used to style the dataframes and be used to higlight particular cells in the dataframes. Funtions are created to carry out this objective. There are seven main functions that will be used to get the expected results on the dataframes.
|
|
|
|
|
|
|
|
Styling Syetem_rating_r1 using the eight functions and the indexes to do so.
def Ratings1(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(19):
df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
df.iloc[i,-4] = "font-size : 8pt"
return df
def Ratings_highlight2(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(19):
df.iloc[i,-1] = 'color:#FFD700;background-color:white'
df.iloc[i,0] = 'color:#ff5500;background-color:white;font-size:8pt;font-weight: bold'
df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
return df
def highlight_cells3(x):
df = x.copy()
df.loc[:,:] = ''
for i in r_below_avg1[:3]:
df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells4(x):
df = x.copy()
df.loc[:,:] = ''
for i in r_below_avg2[:-1]:
df.iloc[i,1] = 'background-color:#ff5500;color:white;border-bottom: 2px solid black'
return df
def highlight_cells5(x):
df = x.copy()
df.loc[:,:] = ''
for i in r_below_avg3[:11]:
df.iloc[i,2] = 'background-color:#ff5500;color:white;border-bottom: 2px solid black'
return df
def highlight_cells6(x):
df = x.copy()
df.loc[:,:] = ''
for i in r_below_avg4[:8]:
df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells7(x):
df = x.copy()
df.loc[:,:] = ''
for i in r_above_avg[:4]:
df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
return df
def borders(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,0] = 'border-right: 6px solid blue'
df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[15:18,0] = 'border-right: 6px solid blue'
df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
df.iloc[22,0] = 'border-right: 6px solid blue'
df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
return df
system_rating_r1=system_rating_r1.style.apply(Ratings_highlight2, axis=None)\
.set_table_styles([{'selector' : '','props' : [('border','3px solid #ff5500')]},
{"selector":"thead", 'props':[("background-color","white"),("color","#ff5500")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','#ff5500')]}#index
])\
.apply(Ratings1, axis=None)\
.apply(highlight_cells3, axis=None)\
.apply(highlight_cells4, axis=None)\
.apply(highlight_cells5, axis=None)\
.apply(highlight_cells6, axis=None)\
.apply(highlight_cells7, axis=None)\
#.set_table_attributes("style='display:inline'")\
#.set_caption('Caption table 1')
The 'Syetem_rating_r1' datarame.
Styling Syetem_rating_r2 using the eight functions and the indexes to do so.
def Ratings8(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(18):#range(19,37):
df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
df.iloc[i,-4] = "font-size : 8pt"
return df
def Ratings_highlight9(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(18):
df.iloc[i,-1] = 'color:#FFD700;background-color:white'
df.iloc[i,0] = 'color:#ff5500;background-color:white;font-size:8pt;font-weight: bold'
df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
return df
def highlight_cells10(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(18):#below_avg1[3:21]
df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells11(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(18):#below_avg3[11:27]:
df.iloc[i,2] = 'background-color:#ff5500;color:white;border-bottom: 2px solid black'
return df
def highlight_cells12(x):
df = x.copy()
df.loc[:,:] = ''
for i in [4, 6]:#below_avg3[11:27]:
df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells13(x):
df = x.copy()
df.loc[:,:] = ''
for i in [4, 6, 11, 12, 14, 15, 16, 17]:#above_avg[4:12]:
df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
return df
def borders(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,0] = 'border-right: 6px solid blue'
df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[15:18,0] = 'border-right: 6px solid blue'
df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
df.iloc[22,0] = 'border-right: 6px solid blue'
df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
return df
system_rating_r2 = system_rating_r2.style.apply(Ratings_highlight9, axis=None)\
.set_table_styles([{'selector' : '','props' : [('border','3px solid #ff5500')]},
{"selector":"thead", 'props':[("background-color","white"),("color","#ff5500")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','#ff5500')]}#index
])\
.apply(Ratings8, axis=None)\
.apply(highlight_cells10, axis=None)\
.apply(highlight_cells11, axis=None)\
.apply(highlight_cells12, axis=None)\
.apply(highlight_cells13, axis=None)\
#.set_table_attributes("style='display:inline'")\
#.set_caption('Caption table 2')
#.apply(borders, axis=None)
#display_html(df1_style._repr_html_() + df2_style._repr_html_(), raw=True)
The 'Syetem_rating_r2' datarame.
Styling Syetem_rating_r3 using the eight functions and the indexes to do so.
def Ratings14(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(18):
df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
df.iloc[i,-4] = "font-size : 8pt"
return df
def Ratings_highlight15(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(18):
df.iloc[i,-1] = 'color:#FFD700;background-color:white'
df.iloc[i,0] = 'color:#ff5500;background-color:white;font-size:8pt;font-weight: bold'
df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
return df
def highlight_cells16(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(18):
df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells17(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[13,1] = 'background-color:#ff5500;color:white;border-bottom: 2px solid black'
return df
def highlight_cells18(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(18):
df.iloc[i,2] = 'background-color:#ff5500;color:white;border-bottom: 2px solid black'
return df
def highlight_cells19(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[12,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells20(x):
df = x.copy()
df.loc[:,:] = ''
for i in [4, 5, 6, 10, 11, 12, 14]:
df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
return df
def borders(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,0] = 'border-right: 6px solid blue'
df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[15:18,0] = 'border-right: 6px solid blue'
df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
df.iloc[22,0] = 'border-right: 6px solid blue'
df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
return df
system_rating_r3 = system_rating_r3.style.apply(Ratings_highlight15, axis=None)\
.apply(Ratings14, axis=None)\
.apply(highlight_cells16, axis=None)\
.apply(highlight_cells17, axis=None)\
.apply(highlight_cells18, axis=None)\
.apply(highlight_cells19, axis=None)\
.apply(highlight_cells20, axis=None)\
.set_table_attributes("style='display:inline'")\
.set_table_styles([{'selector' : '','props' : [('border','3px solid #ff5500')]},
{"selector":"thead", 'props':[("background-color","white"),("color","#ff5500")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','#ff5500')]}#index
])\
.set_properties(**{'text-align': 'center'})
#.set_caption('Caption table 3')
#.apply(borders, axis=None)
The 'Syetem_rating_r3' datarame.
Saving the System_rating_r1 dataframe to the System_rating_r1.png file as an image to be used for the analysis later on.
dfi.export(system_rating_r1, 'system_rating_r1.png')
Saving the System_rating_r2 dataframe to the System_rating_r2.png file as an image to be used for the analysis later on.
dfi.export(system_rating_r2, 'system_rating_r2.png')
Saving the System_rating_r3 dataframe to the System_rating_r3.png file as an image to be used for the analysis later on.
dfi.export(system_rating_r3, 'system_rating_r3.png')
This allows all the three dataframes to be displayed side by side in the analysis below.
def display_side_by_side(*args):
html_str = "<center><font size=6 style='color:#ff5500'>The Return On Investement on R-rated Movies.</font></center> <br> "
for df in args:
html_str += df.to_html()
display_html(
html_str.replace('table','table style="display:inline"'),
raw=True
)
Below will be the start of the creation of dataframes that are in the 'Drama Genre' that are 'G-rated' based on the 'ROI' of each movie.
Index of all the 'G' rated movies.
g_index = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'G':g_index.append(i)
print(g_index) #showing the g_index list
[227, 228, 229, 230, 231, 232, 233, 234, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270]
Checking the number of elements in the 'g_index' list.
len(g_index)
34
Getiing the Cost for all 'G' rated movies.
g_cost = []
for i in g_index:
g_cost.append(Drama_DataFrame.Production_Budget_x[i])
print(g_cost) #showing the g_index list
['$35,446,775', '$700,000', '$8,600,000', '$7,000,000', '$18,000,000', '$4,400,000', '$17,000,000', '$22,000,000', '$20,000,000', '$23,000,000', '$15,000,000', '$2,700,000', '$70,000,000', '$30,000,000', '$2,500,000', '$90,000,000', '$666,000', '$85,000,000', '$17,000,000', '$10,000,000', '$22,000,000', '$18,000,000', '$8,200,000', '$60,000,000', '$45,000,000', '$858,000', '$17,000,000', '$300,000', '$10,000,000', '$6,400,000', '$13,000,000', '$1,750,000', '$1,700,000', '$3,000,000']
Checking the number of elements in the 'g_cost' list.
len(g_cost)
34
Getiing the Name for all 'G' rated movies.
g_name = []
for i in g_index:
g_name.append(Drama_DataFrame.Movie[i])
print(g_name) #showing the g_name list
['La traviata', 'A Sunday in the Country', 'Little Dorrit', 'Prancer', 'The Secret Garden', 'Through the Olive Trees', 'A Little Princess', 'The Rookie', 'Beauty and the Beast 1991', 'The Little Rascals', 'Ramona and Beezus', 'The Black Stallion', 'The Hunchback of Notre Drame', 'Babe', 'Pollyanna', 'Babe: Pig in the City', 'Lassie Come Home', "Charlotte's Web", 'A Little Princess', 'Kit Kittredge: An American Girl', 'The Rookie', 'The Secret Garden', 'The Sound of Music', 'The Tale of Despereaux', 'The Lion King 1994', 'Bambi 1942', 'My Fair Lady 1964', 'Before the Wrath', "Hachiko: A Dog's Story", 'Giant', 'The Ten Commandments 1966', 'The Quiet Man', 'Three Cions in the Fountain', 'Miracle of Marcelino']
Checking the number of elements in the 'g_name' list.
len(g_name)
34
Getiing the ROI for all 'G' rated movies.
g_return_on_investment = []
for i in g_index:
g_return_on_investment.append(Drama_DataFrame.Profit_x[i])
print(g_return_on_investment) #showing the g_return_on_investment list
['$-35,251,281', '$1,711,143', '$-7,574,772', '$11,587,135', '$-9,278,757', '$-4,359,700', '$-6,984,551', '$58,693,537', '$418,656,843', '$43,947,950', '$12,469,621', '$35,099,643', '$255,500,000', '$216,100,000', '$1,250,000', '$-20,868,140', '$3,851,000', '$58,985,708', '$-6,984,551', '$7,657,973', '$58,491,516', '$293,281,000', '$278,014,195', '$30,482,317', '$941,214,868', '$267,142,000', '$55,071,636', '$-191,002', '$37,707,417', '$23,794,409', '$52,500,000', '$5,850,377', '$10,300,000', '$-2,407,139']
Checking the number of elements in the 'g_return_on_investment' list.
len(g_return_on_investment)
34
Getiing the Ratings of all 'G' rated movies.
g_rating = []
for i in g_index:
g_rating.append(Drama_DataFrame.Averagerating[i])
print(g_rating) #showing the g_rating list
[7.2, 7.6, 7.3, 6.4, 7.3, 7.8, 7.7, 6.9, 8.0, 6.3, 6.5, 7.4, 7.0, 9.6, 9.0, 5.8, 7.1, 6.3, 7.6, 6.5, 6.9, 7.3, 8.1, 6.1, 8.5, 7.3, 7.8, 6.6, 8.1, 7.6, 7.9, 7.7, 6.3, 7.1]
Checking the number of elements in the 'g_rating' list.
len(g_rating)
34
Getiing the Profit Percentage of all 'G' rated movies.
g_percent_profit = []
for i in g_index:
i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
g_percent_profit.append(int(round(i,0)))
print(g_percent_profit) #showing the g_percent_profit list
[-99, 244, -88, 166, -52, -99, -41, 267, 2093, 191, 83, 1300, 365, 720, 50, -23, 578, 69, -41, 77, 266, 1629, 3390, 51, 2092, 31135, 324, -64, 377, 372, 404, 334, 606, -80]
Checking the number of elements in the 'g_percent_profit' list.
len(g_percent_profit)
34
Converting integer of the ROI values to percentage of all 'G' rated movies.
g_roi_percent = []
for i in g_percent_profit:
g_roi_percent.append("{:}%".format(i))
print(g_roi_percent) #showing the g_roi_percent list
['-99%', '244%', '-88%', '166%', '-52%', '-99%', '-41%', '267%', '2093%', '191%', '83%', '1300%', '365%', '720%', '50%', '-23%', '578%', '69%', '-41%', '77%', '266%', '1629%', '3390%', '51%', '2092%', '31135%', '324%', '-64%', '377%', '372%', '404%', '334%', '606%', '-80%']
Checking the number of elements in the 'g_roi_percent' list.
len(g_roi_percent)
34
Turning the integer of the star rating of each movie into a star of all 'G' rated movies.
g_stars = []
for i in g_rating:
g_stars.append('*'*int(i))
print(g_stars) #showing the g_stars list
['*******', '*******', '*******', '******', '*******', '*******', '*******', '******', '********', '******', '******', '*******', '*******', '*********', '*********', '*****', '*******', '******', '*******', '******', '******', '*******', '********', '******', '********', '*******', '*******', '******', '********', '*******', '*******', '*******', '******', '*******']
Checking the number of elements in the 'g_stars' list.
len(g_stars)
34
Createing the 'G' rated dataframe with the variables previously created.
system_rating_g = pd.DataFrame({"Name of Movie":g_name, "Cost":g_cost,
"Return On Investment":g_return_on_investment,
"ROI Percentage":g_roi_percent,"Ratings":g_stars})
The 'system_rating_g' dataframe. (this dataframe is interactive)
system_rating_g
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
Getting the index of all the negative values.
neg_values = []
for i,x in enumerate(g_percent_profit):
if x <= 0: neg_values.append(i)
print(neg_values) #showing the neg_values list
[0, 2, 4, 5, 6, 15, 18, 27, 33]
Checking the number of elements in the 'neg_values' list.
len(neg_values)
9
Dropping the negative values and resetting the index of the system_rating_r dataframe.
system_rating_g = system_rating_g.drop(labels=neg_values, axis=0)
system_rating_g = system_rating_g.reset_index(drop=True)
This is the System_rating_g dataframe. It will be divided into two dataframes.
system_rating_g
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
System_rating_g1 is the first dataframe. (this dataframe is interactive)
system_rating_g1=system_rating_g[:12]
system_rating_g1
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
System_rating_g2 is the second dataframe. (this dataframe is interactive)
system_rating_g2=system_rating_g[12:]
system_rating_g2
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
Getting the average Budget of all the 'G' rated movies in the Drama genre.
g_avg_value = sum([int(i.replace('$', '').replace(',', ''))
for i in system_rating_g['Cost']]) / len(system_rating_g['Cost'])
The average Budget of all the 'G' rated Drama movies is $19,698,960.
g_avg_value
19698960.0
Getting the index of all the movies that are below the average Return On Investment of all the 'G' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'G' rated Drama mvoies.
g_cost_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_g['Cost']]
#below ayg
g_below_avg5 = []
for i,x in enumerate(g_cost_index):
if x <= 19698960:g_below_avg5.append(i)
g_below_avg6 = []
for i,x in enumerate(g_cost_index):
if x >= 19698960:g_below_avg6.append(i)
The 'g_below_avg5' list.
print(g_below_avg5)
[0, 1, 5, 6, 9, 10, 12, 14, 15, 18, 19, 20, 21, 22, 23, 24]
The 'g_below_avg6' list.
print(g_below_avg6)
[2, 3, 4, 7, 8, 11, 13, 16, 17]
Getting the average Return On Investment Percentage of all the 'G' rated movies in the Drama genre.
g_avg_value = sum([int(i.replace('$', '').replace(',', ''))
for i in system_rating_g['Return On Investment']]) / len(system_rating_g['Cost'])
The average Return On Investment of all the 'G' rated Drama movies is $127,174,411.
g_avg_value
127174411.52
Getting the index of all the movies that are below the average Return On Investment of all the 'G' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'G' rated Drama mvoies.
g_roi_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_g['Return On Investment']]
#below ayg
g_below_avg7 = []
for i,x in enumerate(g_roi_index):
if x <= 127174411:g_below_avg7.append(i)
g_below_avg8 = []
for i,x in enumerate(g_roi_index):
if x >= 127174411:g_below_avg8.append(i)
The 'g_below_avg7' list.
print(g_below_avg7)
[0, 1, 2, 4, 5, 6, 9, 10, 11, 12, 13, 16, 19, 20, 21, 22, 23, 24]
The 'g_below_avg8' list.
print(g_below_avg8)
[3, 7, 8, 14, 15, 17, 18]
Getting the average Return On Investment Percentage of all the 'G' rated movies in the Drama genre.
g_avg_value = sum([int(i.replace('%', ''))
for i in system_rating_g['ROI Percentage']]) / len(system_rating_g['Cost'])
The average Return On Investment Percentage of all the 'G' rated Drama movies is 1887%.
g_avg_value
1887.32
Getting the index of all the movies that were below the average Return On Investment Percentage of all the 'G' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment Percentage of all the 'G' rated Drama mvoies.
roi_percent_index_g = [int(i.replace('%', ''))for i in system_rating_g['ROI Percentage']]
#below ayg
g_above_avg = []
for i,x in enumerate(roi_percent_index_g):
if x >= 1887:g_above_avg.append(i)
The 'g_above_avg' list.
g_above_avg
[3, 15, 17, 18]
Styling Syetem_rating_g1 using the eight functions and the indexes to do so.
def Ratings21(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(12):
df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
df.iloc[i,-4] = "font-size : 8pt"
return df
def Ratings_highlight22(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(12):
df.iloc[i,-1] = 'color:#FFD700;background-color:white'
df.iloc[i,0] = 'color:red;background-color:white;font-size:8pt;font-weight: bold'
df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
df.iloc[i,1] = 'font-size:4pt'
return df
def highlight_cells23(x):
df = x.copy()
df.loc[:,:] = ''
for i in g_below_avg5[:6]:
df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
return df
def highlight_cells24(x):
df = x.copy()
df.loc[:,:] = ''
for i in g_below_avg6[:6]:
df.iloc[i,1] = 'background-color:red;color:white;border-bottom:2px solid black'
return df
def highlight_cells25(x):
df = x.copy()
df.loc[:,:] = ''
for i in g_below_avg7[:9]:
df.iloc[i,2] = 'background-color:red;color:white;border-bottom: 2px solid black'
return df
def highlight_cells26(x):
df = x.copy()
df.loc[:,:] = ''
for i in g_below_avg8[:3]:
df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells27(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[3,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
return df
def borders(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,0] = 'border-right: 6px solid blue'
df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[15:18,0] = 'border-right: 6px solid blue'
df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
df.iloc[22,0] = 'border-right: 6px solid blue'
df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
return df
system_rating_g1=system_rating_g1.style.apply(Ratings_highlight22, axis=None)\
.set_table_styles([{'selector' : '','props' : [('border','3px solid red')]},
{"selector":"thead", 'props':[("background-color","white"),("color","red")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','red')]}#index
])\
.apply(Ratings21, axis=None)\
.apply(highlight_cells23, axis=None)\
.apply(highlight_cells24, axis=None)\
.apply(highlight_cells25, axis=None)\
.apply(highlight_cells26, axis=None)\
.apply(highlight_cells27, axis=None)\
#.set_table_attributes("style='display:inline'")\
#.set_caption('Caption table 1')
The 'Syetem_rating_g1' datarame.
Styling Syetem_rating_g2 using the eight functions and the indexes to do so.
def Ratings28(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(13):
df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
df.iloc[i,-4] = "font-size : 8pt"
return df
def Ratings_highlight29(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(13):
df.iloc[i,-1] = 'color:#FFD700;background-color:white'
df.iloc[i,0] = 'color:red;background-color:white;font-size:8pt;font-weight: bold'
df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
df.iloc[i,1] = 'font-size:4pt'
return df
def highlight_cells30(x):
df = x.copy()
df.loc[:,:] = ''
for i in [0, 2, 3, 6, 7, 8, 9, 10, 11, 12]:
df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
return df
def highlight_cells31(x):
df = x.copy()
df.loc[:,:] = ''
for i in [1,4,5]:
df.iloc[i,1] = 'background-color:red;color:white;border-bottom:2px solid black'
return df
def highlight_cells32(x):
df = x.copy()
df.loc[:,:] = ''
for i in [0, 1, 4, 7, 8, 9, 10, 11, 12]:
df.iloc[i,2] = 'background-color:red;color:white;border-bottom: 2px solid black'
return df
def highlight_cells33(x):
df = x.copy()
df.loc[:,:] = ''
for i in [2, 3, 5, 6]:
df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells34(x):
df = x.copy()
df.loc[:,:] = ''
for i in [3, 5, 6]:
df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
return df
def borders(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,0] = 'border-right: 6px solid blue'
df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[15:18,0] = 'border-right: 6px solid blue'
df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
df.iloc[22,0] = 'border-right: 6px solid blue'
df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
return df
system_rating_g2=system_rating_g2.style.apply(Ratings_highlight29, axis=None)\
.set_table_styles([{'selector' : '','props' : [('border','3px solid red')]},
{"selector":"thead", 'props':[("background-color","white"),("color","red")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','red')]}#index
])\
.apply(Ratings28, axis=None)\
.apply(highlight_cells30, axis=None)\
.apply(highlight_cells31, axis=None)\
.apply(highlight_cells32, axis=None)\
.apply(highlight_cells33, axis=None)\
.apply(highlight_cells34, axis=None)\
#.set_table_attributes("style='display:inline'")\
#.set_caption('Caption table 1')
The 'Syetem_rating_g2' datarame.
Saving the System_rating_g1 dataframe to the System_rating_g1.png file as an image to be used for the analysis later on.
dfi.export(system_rating_g1, 'system_rating_g1.png')
Saving the System_rating_g2 dataframe to the System_rating_g2.png file as an image to be used for the analysis later on.
dfi.export(system_rating_g2, 'system_rating_g2.png')
This allows all the two dataframes to be displayed side by side.
def display_side_by_side2(*args):
html_str = "<center><font size=6 style='color:red'>The Return On Investement on G-rated Movies.</font></center> <br> "
for df in args:
html_str += df.to_html()
display_html(
html_str.replace('table','table style="display:inline"'),
raw=True
)
Below will be the creation of dataframes that are in the 'Drama Genre' that are 'PG-rated'.
Index of all the 'PG' rated movies.
pg_index = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'PG':pg_index.append(i)
print(pg_index) #showing the pg_index list
[0, 31, 40, 61, 62, 129, 141, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213]
Checking the number of elements in the 'pg_index' list.
len(pg_index)
67
Getiing the Cost for all 'PG' rated movies.
pg_cost = []
for i in pg_index:
pg_cost.append(Drama_DataFrame.Production_Budget_x[i])
print(pg_cost) #showing the pg_index list
['$180,000,000', '$37,000,000', '$31,000,000', '$20,000,000', '$20,000,000', '$3,000,000', '$1,700,000', '$5,100,000', '$10,000,000', '$95,000,000', '$3,000,000', '$20,000,000', '$40,000,000', '$5,000,000', '$422,000', '$5,100,000', '$72,000,000', '$11,800,000', '$15,000,000', '$32,000,000', '$40,000,000', '$65,000,000', '$8,000,000', '$9,000,000', '$17,000,000', '$30,000,000', '$500,000', '$20,000,000', '$11,000,000', '$2,000,000', '$23,000,000', '$45,000,000', '$15,000,000', '$10,000,000', '$32,000,000', '$90,000,000', '$10,000,000', '$27,000,000', '$16,000,000', '$3,000,000', '$15,000,000', '$25,000,000', '$34,000,000', '$10,000,000', '$20,000,000', '$15,000,000', '$12,000,000', '$5,000,000', '$7,000,000', '$14,000,000', '$15,000,000', '$12,000,000', '$28,300,000', '$8,000,000', '$7,500,000', '$17,000,000', '$5,000,000', '$9,000,000', '$15,000,000', '$22,000,000', '$5,000,000', '$4,500,000', '$4,500,000', '$8,000,000', '$16,000,000', '$8,200,000', '$28,000,000']
Checking the number of elements in the 'pg_cost' list.
len(pg_cost)
67
Getiing the Name for all 'PG' rated movies.
pg_name = []
for i in pg_index:
pg_name.append(Drama_DataFrame.Movie[i])
print(pg_name) #showing the pg_name list
['Hugo', 'Dolphin Tale', 'Extraordinary Measures', 'Wonder', 'The Last Song', 'War Room', 'The Lunchbox', 'Somewhere in Time', 'Urban Cowboy', 'Cinderella', 'War Room', 'Wonder', 'Little Women', 'Overcomer', 'The Jazz Singer', 'Cattle Annie and Little Britches', 'The Majestic', 'A Walk to Remember', 'Tuck Everlasting', 'Dreamer', 'The Lake House', 'We Are Marshall', 'Akeelah and the Bee', 'The Ultimate Gift', 'Bridge to Terabithia', 'August Rush', 'Fireproof', 'The Last Song', 'What If...', "God's Not Dead", "Mr. Holland's Opus", 'The Indian in the Cupboard', 'Fluke', 'Three Wishes', 'Phenomenon', 'Contact', 'The Spanish Prisoner', 'Music of the Heart', 'Sense and Sensibility', 'The Secret of Roan Inish', 'The Remains of the Day', 'Gettysburg', 'The Age of Innocence', 'Pure Country', 'Forever Young', 'Newsies', 'A River Runs Through It', 'Honeysuckle Rose', 'Resurrection', 'Taps', 'On Golden Pond', 'Absence of Malice', 'Ragtime', 'Looker', 'The Night the Lights Went Out in Georgia', 'Rocky III', 'Tex', 'Six Weeks', 'Five Days One Summer', 'Staying Alive', 'Eddie and the Cruisers', 'Tender Mercies', 'Testament', 'Table for Five', 'Man, Woman and Child', 'Footloose', 'The Natural']
Checking the number of elements in the 'pg_name' list.
len(pg_name)
67
Getiing the ROI for all 'PG' rated movies.
pg_return_on_investment = []
for i in pg_index:
pg_return_on_investment.append(Drama_DataFrame.Profit_x[i])
print(pg_return_on_investment) #showing the pg_return_on_investment list
['$47,784', '$59,068,724', '$-15,173,016', '$284,604,712', '$72,678,948', '$70,975,239', '$10,531,500', '$4,609,597', '$36,918,287', '$447,351,353', '$70,986,904', '$285,937,718', '$176,601,214', '$33,102,988', '$26,696,000', '$-4,565,184', '$-34,693,666', '$35,694,916', '$4,344,615', '$6,741,732', '$74,830,111', '$-21,454,636', '$10,948,425', '$-5,561,265', '$120,587,063', '$34,605,762', '$32,973,297', '$69,137,047', '$-2,473,712', '$62,667,874', '$83,269,971', '$-9,343,870', '$-11,012,232', '$-2,974,504', '$120,036,382', '$81,120,329', '$3,835,130', '$-12,140,606', '$118,582,776', '$3,101,815', '$48,954,968', '$-14,230,040', '$-1,744,560', '$5,164,458', '$107,956,187', '$-12,180,515', '$31,440,294', '$12,815,212', '$150,297,525', '$21,856,053', '$104,285,432', '$28,716,963', '$-13,379,219', '$-4,718,768', '$7,423,752', '$108,052,686', '$544,368,315', '$-2,331,975', '$-14,800,922', '$42,892,670', '$-213,211', '$3,943,124', '$-2,455,108', '$-5,600,000', '$-14,294,092', '$71,808,942', '$20,000,000']
Checking the number of elements in the 'pg_return_on_investment' list.
len(pg_return_on_investment)
67
Getiing the Ratings of all 'PG' rated movies.
pg_rating = []
for i in pg_index:
pg_rating.append(Drama_DataFrame.Averagerating[i])
print(pg_rating) #showing the pg_rating list
[7.5, 6.9, 6.5, 8.0, 6.0, 6.5, 7.8, 7.2, 6.4, 6.9, 6.5, 8.0, 7.8, 6.6, 5.9, 6.1, 6.9, 7.3, 6.6, 6.8, 6.8, 7.1, 7.4, 7.3, 7.1, 7.5, 6.5, 6.0, 6.4, 4.7, 7.3, 6.0, 6.7, 6.1, 6.4, 7.5, 7.2, 6.8, 7.7, 7.5, 7.8, 7.6, 7.2, 7.0, 6.3, 6.9, 7.2, 6.3, 7.3, 6.8, 7.6, 6.9, 7.3, 6.1, 6.0, 6.8, 6.5, 5.7, 6.1, 4.7, 6.9, 7.4, 7.0, 6.1, 6.1, 6.6, 7.5]
Checking the number of elements in the 'pg_rating' list.
len(pg_rating)
67
Getiing the Profit Percentage of all 'PG' rated movies.
pg_percent_profit = []
for i in pg_index:
i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
pg_percent_profit.append(int(round(i,0)))
print(pg_percent_profit) #showing the pg_percent_profit list
[0, 160, -49, 1423, 363, 2366, 620, 90, 369, 471, 2366, 1430, 442, 662, 6326, -90, -48, 302, 29, 21, 187, -33, 137, -62, 709, 115, 6595, 346, -22, 3133, 362, -21, -73, -30, 375, 90, 38, -45, 741, 103, 326, -57, -5, 52, 540, -81, 262, 256, 2147, 156, 695, 239, -47, -59, 99, 636, 10887, -26, -99, 195, -4, 88, -55, -70, -89, 876, 71]
Checking the number of elements in the 'pg_percent_profit' list.
len(pg_percent_profit)
67
Converting integer of the ROI values to percentage of all 'PG' rated movies.
pg_roi_percent = []
for i in pg_percent_profit:
pg_roi_percent.append("{:}%".format(i))
print(pg_roi_percent) #showing the pg_roi_percent list
['0%', '160%', '-49%', '1423%', '363%', '2366%', '620%', '90%', '369%', '471%', '2366%', '1430%', '442%', '662%', '6326%', '-90%', '-48%', '302%', '29%', '21%', '187%', '-33%', '137%', '-62%', '709%', '115%', '6595%', '346%', '-22%', '3133%', '362%', '-21%', '-73%', '-30%', '375%', '90%', '38%', '-45%', '741%', '103%', '326%', '-57%', '-5%', '52%', '540%', '-81%', '262%', '256%', '2147%', '156%', '695%', '239%', '-47%', '-59%', '99%', '636%', '10887%', '-26%', '-99%', '195%', '-4%', '88%', '-55%', '-70%', '-89%', '876%', '71%']
Checking the number of elements in the 'pg_roi_percent' list.
len(pg_roi_percent)
67
Turning the integer of the star rating of each movie into a star of all 'PG' rated movies.
pg_stars = []
for i in pg_rating:
pg_stars.append('*'*int(i))
print(pg_stars) #showing the pg_stars list
['*******', '******', '******', '********', '******', '******', '*******', '*******', '******', '******', '******', '********', '*******', '******', '*****', '******', '******', '*******', '******', '******', '******', '*******', '*******', '*******', '*******', '*******', '******', '******', '******', '****', '*******', '******', '******', '******', '******', '*******', '*******', '******', '*******', '*******', '*******', '*******', '*******', '*******', '******', '******', '*******', '******', '*******', '******', '*******', '******', '*******', '******', '******', '******', '******', '*****', '******', '****', '******', '*******', '*******', '******', '******', '******', '*******']
Checking the number of elements in the 'pg_stars' list.
len(pg_stars)
67
Createing the 'PG' rated dataframe with the variables previously created.
system_rating_pg = pd.DataFrame({"Name of Movie":pg_name, "Cost":pg_cost,
"Return On Investment":pg_return_on_investment,
"ROI Percentage":pg_roi_percent,"Ratings":pg_stars})
The 'system_rating_pg' dataframe. (this dataframe is interactive)
system_rating_pg
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
Getting the index of all the negative values.
neg_values = []
for i,x in enumerate(pg_percent_profit):
if x <= 0: neg_values.append(i)
print(neg_values) #showing the neg_values list
[0, 2, 15, 16, 21, 23, 28, 31, 32, 33, 37, 41, 42, 45, 52, 53, 57, 58, 60, 62, 63, 64]
Checking the number of elements in the 'neg_values' list.
len(neg_values)
22
Dropping the negative values and resetting the index of the system_rating_pg dataframe.
system_rating_pg = system_rating_pg.drop(labels=neg_values, axis=0)
system_rating_pg = system_rating_pg.reset_index(drop=True)
The new 'system_rating_pg' dataframe. It will be divided into two dataframes. (this dataframe is interactive)
system_rating_pg
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
System_rating_pg1 is the first dataframe. (this dataframe is interactive)
system_rating_pg1=system_rating_pg[:22]
system_rating_pg1
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
System_rating_pg2 is the first dataframe. (this dataframe is interactive)
system_rating_pg2=system_rating_pg[22:]
system_rating_pg2
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
Getting the average Budget of all the 'PG' rated movies in the Drama genre.
pg_avg_value = sum([int(i.replace('$', '').replace(',', ''))
for i in system_rating_pg['Cost']]) / len(system_rating_pg['Cost'])
The average Budget of all the 'PG' rated Drama movies is $18,060,488.
pg_avg_value
18060488.888888888
Getting the index of all the movies that are below the average Return On Investment of all the 'PG' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'PG' rated Drama mvoies.
pg_cost_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_pg['Cost']]
#below ayg
pg_below_avg5 = []
for i,x in enumerate(pg_cost_index):
if x <= 18060488:pg_below_avg5.append(i)
pg_below_avg6 = []
for i,x in enumerate(pg_cost_index):
if x >= 18060488:pg_below_avg6.append(i)
The 'pg_below_avg5' list.
print(pg_below_avg5)
[3, 4, 5, 6, 8, 11, 12, 13, 14, 17, 18, 20, 22, 26, 27, 28, 29, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 42, 43]
The 'pg_below_avg6' list.
print(pg_below_avg6)
[0, 1, 2, 7, 9, 10, 15, 16, 19, 21, 23, 24, 25, 31, 41, 44]
Getting the average Return On Investment Percentage of all the 'PG' rated movies in the Drama genre.
pg_avg_value = sum([int(i.replace('$', '').replace(',', ''))
for i in system_rating_pg['Return On Investment']]) / len(system_rating_pg['Cost'])
The average Return On Investment of all the 'PG' rated Drama movies is $83,389,266.
pg_avg_value
83389266.8888889
Getting the index of all the movies that are below the average Return On Investment of all the 'PG' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'PG' rated Drama mvoies.
pg_roi_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_pg['Return On Investment']]
#below ayg
pg_below_avg7 = []
for i,x in enumerate(pg_roi_index):
if x <= 83389266:pg_below_avg7.append(i)
pg_below_avg8 = []
for i,x in enumerate(pg_roi_index):
if x >= 83389266:pg_below_avg8.append(i)
The 'pg_below_avg7' list.
print(pg_below_avg7)
[0, 2, 3, 4, 5, 6, 8, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 25, 26, 28, 29, 30, 32, 33, 35, 37, 38, 41, 42, 43, 44]
The 'pg_below_avg8' list.
print(pg_below_avg8)
[1, 7, 9, 10, 18, 24, 27, 31, 34, 36, 39, 40]
Getting the average Return On Investment Percentage of all the 'PG' rated movies in the Drama genre.
pg_avg_value = sum([int(i.replace('%', ''))
for i in system_rating_pg['ROI Percentage']]) / len(system_rating_pg['Cost'])
The average Return On Investment Percentage of all the 'PG' rated Drama movies is 1064%.
pg_avg_value
1064.3555555555556
Getting the index of all the movies that were below the average Return On Investment Percentage of all the 'PG' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment Percentage of all the 'PG' rated Drama mvoies.
roi_percent_index_pg = [int(i.replace('%', ''))for i in system_rating_pg['ROI Percentage']]
#below ayg
pg_above_avg = []
for i,x in enumerate(roi_percent_index_pg):
if x >= 1064:pg_above_avg.append(i)
The 'pg_below_avg' list.
print(pg_above_avg)
[1, 3, 8, 9, 12, 20, 22, 34, 40]
Styling Syetem_rating_pg1 using the eight functions and the indexes to do so.
def Ratings35(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(22):
df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
df.iloc[i,-4] = "font-size : 8pt"
return df
def Ratings_highlight36(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(22):
df.iloc[i,-1] = 'color:#FFD700;background-color:white'
df.iloc[i,0] = 'color:#FA5F55;background-color:white;font-size:8pt;font-weight: bold'
df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
df.iloc[i,1] = 'font-size:4pt'
return df
def highlight_cells37(x):
df = x.copy()
df.loc[:,:] = ''
for i in pg_below_avg5[:12]:
df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
return df
def highlight_cells38(x):
df = x.copy()
df.loc[:,:] = ''
for i in pg_below_avg6[:10]:
df.iloc[i,1] = 'background-color:#FA5F55;color:white;border-bottom:2px solid black'
return df
def highlight_cells39(x):
df = x.copy()
df.loc[:,:] = ''
for i in pg_below_avg7[:17]:
df.iloc[i,2] = 'background-color:#FA5F55;color:white;border-bottom: 2px solid black'
return df
def highlight_cells40(x):
df = x.copy()
df.loc[:,:] = ''
for i in pg_below_avg8[:5]:
df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells41(x):
df = x.copy()
df.loc[:,:] = ''
for i in pg_above_avg[:6]:
df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
return df
def borders(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,0] = 'border-right: 6px solid blue'
df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[15:18,0] = 'border-right: 6px solid blue'
df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
df.iloc[22,0] = 'border-right: 6px solid blue'
df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
return df
system_rating_pg1=system_rating_pg1.style.apply(Ratings_highlight36, axis=None)\
.set_table_styles([{'selector' : '','props' : [('border','3px solid #FA5F55')]},
{"selector":"thead", 'props':[("background-color","white"),("color","#FA5F55")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','#FA5F55')]}#index
])\
.apply(Ratings35, axis=None)\
.apply(highlight_cells37, axis=None)\
.apply(highlight_cells38, axis=None)\
.apply(highlight_cells39, axis=None)\
.apply(highlight_cells40, axis=None)\
.apply(highlight_cells41, axis=None)\
#.set_table_attributes("style='display:inline'")\
#.set_caption('Caption table 1')
The 'Syetem_rating_pg1' datarame.
Styling Syetem_rating_pg2 using the eight functions and the indexes to do so.
def Ratings41(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(23):
df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
df.iloc[i,-4] = "font-size : 8pt"
return df
def Ratings_highlight42(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(23):
df.iloc[i,-1] = 'color:#FFD700;background-color:white'
df.iloc[i,0] = 'color:#FA5F55;background-color:white;font-size:8pt;font-weight: bold'
df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
df.iloc[i,1] = 'font-size:4pt'
return df
def highlight_cells43(x):
df = x.copy()
df.loc[:,:] = ''
for i in [1, 2, 3, 9, 19, 22]:
df.iloc[i,1] = 'background-color:#FA5F55;color:white;border-bottom:2px solid black'
return df
def highlight_cells44(x):
df = x.copy()
df.loc[:,:] = ''
for i in [0, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21]:
df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
return df
def highlight_cells45(x):
df = x.copy()
df.loc[:,:] = ''
for i in [0, 3, 4, 6, 7, 8, 10, 11, 13, 15, 16, 18, 19, 20, 21, 22]:
df.iloc[i,2] = 'background-color:#FA5F55;color:white;border-bottom: 2px solid black'
return df
def highlight_cells46(x):
df = x.copy()
df.loc[:,:] = ''
for i in [1, 2, 5, 9, 12, 14, 17]:
df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells47(x):
df = x.copy()
df.loc[:,:] = ''
for i in [0, 12, 18]:
df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
return df
def borders(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,0] = 'border-right: 6px solid blue'
df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[15:18,0] = 'border-right: 6px solid blue'
df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
df.iloc[22,0] = 'border-right: 6px solid blue'
df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
return df
system_rating_pg2 = system_rating_pg2.style.apply(Ratings_highlight42, axis=None)\
.set_table_styles([{'selector' : '','props' : [('border','3px solid #FA5F55')]},
{"selector":"thead", 'props':[("background-color","white"),("color","#FA5F55")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','#FA5F55')]}#index
])\
.apply(Ratings41, axis=None)\
.apply(highlight_cells43, axis=None)\
.apply(highlight_cells44, axis=None)\
.apply(highlight_cells45, axis=None)\
.apply(highlight_cells46, axis=None)\
.apply(highlight_cells47, axis=None)\
#.set_table_attributes("style='display:inline'")\
#.set_caption('Caption table 1')
The 'Syetem_rating_pg2' datarame.
Saving the System_rating_pg1 dataframe to the System_rating_pg1.png file as an image to be used for the analysis later on.
dfi.export(system_rating_pg1, 'system_rating_pg1.png')
Saving the System_rating_pg2 dataframe to the System_rating_pg2.png file as an image to be used for the analysis later on.
dfi.export(system_rating_pg2, 'system_rating_pg2.png')
This allows all the two dataframes to be displayed side by side.
def display_side_by_side3(*args):
html_str = "<center><font size=6 style='color:#FA5F55'>The Return On Investement on PG-rated Movies.</font></center> <br> "
for df in args:
html_str += df.to_html()
display_html(
html_str.replace('table','table style="display:inline"'),
raw=True
)
Below will be the creation of dataframes that are in the 'Drama Genre' that are 'PG-13 rated' based on the 'ROI' of each movie.
Index of all the 'PG-13' rated movies.
pg13_index = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'PG-13':pg13_index.append(i)
print(pg13_index) #showing the pg13_index list
[2, 4, 7, 8, 12, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 30, 32, 33, 34, 35, 37, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 60, 63, 65, 68, 69, 70, 72, 73, 74, 75, 78, 79, 80, 83, 86, 89, 91, 95, 96, 99, 100, 102, 104, 105, 108, 109, 113, 114, 115, 117, 119, 122, 123, 131, 132, 143, 149, 151]
Checking the number of elements in the 'pg13_index' list.
len(pg13_index)
76
Getiing the Profit for all 'PG-13' rated movies.
pg13_profit = []
for i in pg13_index:
pg13_profit.append(Drama_DataFrame.Profit[i])
print(pg13_profit) #showing the pg13_profit list
[583698673.0, 559454789.0, 77551594.0, -12181087.0, 35552675.0, 163591522.0, 129748880.0, 58660270.0, -8357834.0, -23612961.0, 22004627.0, 156127894.0, 4478084.0, 122498338.0, 129590606.0, -23659233.0, -8875633.0, 78809717.0, 136567581.0, 60143987.0, 49309093.0, 217276928.0, 26721826.0, 29802928.0, 132552290.0, 167618160.0, 38984536.0, -13518595.0, 66050951.0, -11684491.0, 15059418.0, 188120004.0, 117033509.0, 71633833.0, 41540205.0, 4847480.0, 57917283.0, 40282881.0, -15953962.0, 188265198.0, 2281732.0, 57086711.0, -3810190.0, -10319750.0, 317522294.0, 21028230.0, 36545707.0, 40506120.0, 113955898.0, 5601987.0, 44168692.0, 20044909.0, 20069303.0, 20909437.0, 11477345.0, 67356170.0, 51076141.0, 51603136.0, 21556959.0, 27087044.0, 72831866.0, 12971021.0, 23787727.0, 29964656.0, 10369708.0, 143806510.0, 36699612.0, 13945682.0, 1205034.0, -3472240.0, -3157373.0, 12698355.0, 33185884.0, 4152584.0, 3478400.0, 1927779.0]
Checking the number of elements in the 'pg13_profit' list.
len(pg13_profit)
76
Getiing the Cost for all 'PG-13' rated movies.
pg13_cost = []
for i in pg13_index:
pg13_cost.append(Drama_DataFrame.Production_Budget_x[i])
print(pg13_cost) #showing the pg13_cost list
['$110,000,000', '$75,000,000', '$60,000,000', '$60,000,000', '$55,000,000', '$50,000,000', '$50,000,000', '$50,000,000', '$50,000,000', '$50,000,000', '$49,000,000', '$47,000,000', '$44,000,000', '$40,000,000', '$40,000,000', '$40,000,000', '$40,000,000', '$38,000,000', '$37,000,000', '$37,000,000', '$36,000,000', '$35,000,000', '$35,000,000', '$34,000,000', '$33,000,000', '$30,000,000', '$30,000,000', '$30,000,000', '$28,000,000', '$27,500,000', '$26,000,000', '$25,000,000', '$25,000,000', '$25,000,000', '$25,000,000', '$25,000,000', '$25,000,000', '$24,000,000', '$21,000,000', '$20,000,000', '$20,000,000', '$19,000,000', '$18,000,000', '$18,000,000', '$17,000,000', '$17,000,000', '$16,000,000', '$16,000,000', '$15,000,000', '$15,000,000', '$15,000,000', '$14,000,000', '$13,000,000', '$12,000,000', '$12,000,000', '$11,000,000', '$11,000,000', '$10,000,000', '$10,000,000', '$9,700,000', '$9,000,000', '$9,000,000', '$7,400,000', '$7,000,000', '$6,000,000', '$5,000,000', '$5,000,000', '$5,000,000', '$5,000,000', '$4,500,000', '$4,357,373', '$2,600,000', '$2,000,000', '$1,400,000', '$250,000', '$175,000']
Checking the number of elements in the 'pg13_cost' list.
len(pg13_cost)
76
Getiing the Name for all 'PG-13' rated movies.
pg13_name = []
for i in pg13_index:
pg13_name.append(Drama_DataFrame.Movie[i])
print(pg13_name) #showing the pg13_name list
['Gravity', 'Sing', 'Contagion', 'Trouble with the Curve', 'Burlesque', 'Creed II', 'The Post', 'Hereafter', 'Dream House', 'Upside Down', 'Anna Karenina', 'Arrival', 'Charlie St. Cloud', 'Bridge of Spies', 'The Impossible', 'Paranoia', 'Victor Frankenstein', 'Water for Elephants', 'Creed', 'The Rite', 'Collateral Beauty', 'True Grit', 'The Tree of Life', 'The Longest Ride', 'Step Up Revolution', 'The Vow', 'The Age of Adaline', 'The Space Between Us', 'Safe Haven', 'Anonymous', 'The Best of Me', 'The Help', 'Dear John', 'The Lucky One', 'The Giver', 'Draft Day', 'Rings', 'Fences', 'The Beaver', 'Me Before You', 'The Light Between Oceans', 'The Book Thief', 'Labor Day', 'Midnight Special', 'A Quiet Place', 'Beastly', 'The Roommate', 'Remember Me', 'The Woman in Black', 'Country Strong', 'One Day', 'Suffragette', 'The Perks of Being a Wallflower', 'Project Almanac', 'Wish Upon', 'If I Stay', 'Brooklyn', 'Everything, Everything', 'Mud', 'Amour', 'Ouija: Origin of Evil', 'Black or White', 'The Bye Bye Man', 'Gifted', 'The Words', 'Lights Out', 'Still Alice', 'Before I Fall', 'Rabbit Hole', 'Maggie', 'Anna', 'Ida', 'Courageous', 'Mustang', 'Like Crazy', 'Another Earth']
Checking the number of elements in the 'pg13_name' list.
len(pg13_name)
76
Getiing the ROI for all 'PG-13' rated movies.
pg13_return_on_investment = []
for i in pg13_index:
pg13_return_on_investment.append(Drama_DataFrame.Profit_x[i])
print(pg13_return_on_investment) #showing the pg13_return_on_investment list
['$583,698,673', '$559,454,789', '$77,551,594', '$-12,181,087', '$35,552,675', '$163,591,522', '$129,748,880', '$58,660,270', '$-8,357,834', '$-23,612,961', '$22,004,627', '$156,127,894', '$4,478,084', '$122,498,338', '$129,590,606', '$-23,659,233', '$-8,875,633', '$78,809,717', '$136,567,581', '$60,143,987', '$49,309,093', '$217,276,928', '$26,721,826', '$29,802,928', '$132,552,290', '$167,618,160', '$38,984,536', '$-13,518,595', '$66,050,951', '$-11,684,491', '$15,059,418', '$188,120,004', '$117,033,509', '$71,633,833', '$41,540,205', '$4,847,480', '$57,917,283', '$40,282,881', '$-15,953,962', '$188,265,198', '$2,281,732', '$57,086,711', '$-3,810,190', '$-10,319,750', '$317,522,294', '$21,028,230', '$36,545,707', '$40,506,120', '$113,955,898', '$5,601,987', '$44,168,692', '$20,044,909', '$20,069,303', '$20,909,437', '$11,477,345', '$67,356,170', '$51,076,141', '$51,603,136', '$21,556,959', '$27,087,044', '$72,831,866', '$12,971,021', '$23,787,727', '$29,964,656', '$10,369,708', '$143,806,510', '$36,699,612', '$13,945,682', '$1,205,034', '$-3,472,240', '$-3,157,373', '$12,698,355', '$33,185,884', '$4,152,584', '$3,478,400', '$1,927,779']
Checking the number of elements in the 'pg13_return_on_investment' list.
len(pg13_return_on_investment)
76
Getiing the Ratings of all 'PG-13' rated movies.
pg13_rating = []
for i in pg13_index:
pg13_rating.append(Drama_DataFrame.Averagerating[i])
print(pg13_rating) #showing the pg13_rating list
[7.7, 7.1, 6.6, 6.8, 6.4, 7.2, 7.2, 6.5, 6.0, 6.6, 6.6, 7.9, 6.5, 7.6, 7.6, 5.7, 6.0, 6.9, 5.8, 6.0, 6.8, 7.6, 6.8, 7.1, 6.5, 6.8, 7.2, 6.4, 6.7, 6.9, 6.7, 8.1, 6.3, 6.5, 6.5, 6.8, 4.5, 7.2, 6.7, 7.4, 7.2, 7.6, 6.9, 6.6, 6.6, 5.6, 4.9, 7.1, 6.4, 6.3, 7.0, 6.9, 8.0, 6.4, 5.0, 6.8, 7.5, 6.4, 7.4, 7.9, 6.1, 6.6, 4.3, 5.6, 7.1, 6.4, 7.5, 6.4, 7.0, 6.4, 6.5, 7.4, 7.0, 7.6, 6.7, 7.0]
Checking the number of elements in the 'pg13_rating' list.
len(pg13_rating)
76
Getiing the Profit Percentage of all 'PG-13' rated movies.
pg13_percent_profit = []
for i in pg13_index:
i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
pg13_percent_profit.append(int(round(i,0)))
print(pg13_percent_profit) #showing the pg13_percent_profit list
[531, 746, 129, -20, 65, 327, 259, 117, -17, -47, 45, 332, 10, 306, 324, -59, -22, 207, 369, 163, 137, 621, 76, 88, 402, 559, 130, -45, 236, -42, 58, 752, 468, 287, 166, 19, 232, 168, -76, 941, 11, 300, -21, -57, 1868, 124, 228, 253, 760, 37, 294, 143, 154, 174, 96, 612, 464, 516, 216, 279, 809, 144, 321, 428, 173, 2876, 734, 279, 24, -77, -72, 488, 1659, 297, 1391, 1102]
Checking the number of elements in the 'pg13_percent_profit' list.
len(pg13_percent_profit)
76
Converting integer of the ROI values to percentage of all 'PG-13' rated movies.
pg13_roi_percent = []
for i in pg13_percent_profit:
pg13_roi_percent.append("{:}%".format(i))
print(pg13_roi_percent) #showing the pg13_roi_percent list
['531%', '746%', '129%', '-20%', '65%', '327%', '259%', '117%', '-17%', '-47%', '45%', '332%', '10%', '306%', '324%', '-59%', '-22%', '207%', '369%', '163%', '137%', '621%', '76%', '88%', '402%', '559%', '130%', '-45%', '236%', '-42%', '58%', '752%', '468%', '287%', '166%', '19%', '232%', '168%', '-76%', '941%', '11%', '300%', '-21%', '-57%', '1868%', '124%', '228%', '253%', '760%', '37%', '294%', '143%', '154%', '174%', '96%', '612%', '464%', '516%', '216%', '279%', '809%', '144%', '321%', '428%', '173%', '2876%', '734%', '279%', '24%', '-77%', '-72%', '488%', '1659%', '297%', '1391%', '1102%']
Checking the number of elements in the 'pg13_roi_percent' list.
len(pg13_roi_percent)
76
Turning the integer of the star rating of each movie into a star of all 'PG-13' rated movies.
pg13_stars = []
for i in pg13_rating:
pg13_stars.append('*'*int(i))
print(pg13_stars) #showing the pg13_stars list
['*******', '*******', '******', '******', '******', '*******', '*******', '******', '******', '******', '******', '*******', '******', '*******', '*******', '*****', '******', '******', '*****', '******', '******', '*******', '******', '*******', '******', '******', '*******', '******', '******', '******', '******', '********', '******', '******', '******', '******', '****', '*******', '******', '*******', '*******', '*******', '******', '******', '******', '*****', '****', '*******', '******', '******', '*******', '******', '********', '******', '*****', '******', '*******', '******', '*******', '*******', '******', '******', '****', '*****', '*******', '******', '*******', '******', '*******', '******', '******', '*******', '*******', '*******', '******', '*******']
Checking the number of elements in the 'pg13_stars' list.
len(pg13_stars)
76
Createing the 'PG-13' rated dataframe with the variables previously created.
system_rating_pg13 = pd.DataFrame({"Name of Movie":pg13_name, "Cost":pg13_cost,
"Return On Investment":pg13_return_on_investment,
"ROI Percentage":pg13_roi_percent,"Ratings":pg13_stars})
The 'system_rating_pg13' dataframe. (this dataframe is interactive)
system_rating_pg13
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
Getting the index of all the negative values.
neg_values = []
for i,x in enumerate(pg13_percent_profit):
if x <= 0: neg_values.append(i)
print(neg_values) #showing the neg_values list
[3, 8, 9, 15, 16, 27, 29, 38, 42, 43, 69, 70]
Checking the number of elements in the 'neg_values' list.
len(neg_values)
12
Dropping the negative values and resetting the index of the system_rating_pg dataframe.
system_rating_pg13= system_rating_pg13.drop(labels=neg_values, axis=0)
system_rating_pg13 = system_rating_pg13.reset_index(drop=True)
The new 'system_rating_pg13' dataframe. It will be divided into three dataframes. (this dataframe is interactive)
system_rating_pg13
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
System_rating_pg131 is the first dataframe. (this dataframe is interactive)
system_rating_pg131=system_rating_pg13[:22]
system_rating_pg131
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
System_rating_pg132 is the first dataframe. (this dataframe is interactive)
system_rating_pg132=system_rating_pg13[22:42]
system_rating_pg132
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
System_rating_pg133 is the first dataframe. (this dataframe is interactive)
system_rating_pg133=system_rating_pg13[42:]
system_rating_pg133
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
Getting the average Budget of all the 'PG-13' rated movies in the Drama genre.
pg13_avg_value = sum([int(i.replace('$', '').replace(',', ''))
for i in system_rating_pg13['Cost']]) / len(system_rating_pg13['Cost'])
The average Budget of all the 'PG-13' rated Drama movies is $24,695,703.
pg13_avg_value
24695703.125
Getting the index of all the movies that are below the average Return On Investment of all the 'PG-13' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'PG-13' rated Drama mvoies.
pg13_cost_index = [int(i.replace('$', '').replace(',', ''))
for i in system_rating_pg13['Cost']]
#below ayg
pg13_below_avg1 = []
for i,x in enumerate(pg13_cost_index):
if x <= 24695703:pg13_below_avg1.append(i)
pg13_below_avg2 = []
for i,x in enumerate(pg13_cost_index):
if x >= 24695703:pg13_below_avg2.append(i)
The 'pg13_below_avg1' list.
print(pg13_below_avg1)
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
The 'pg13_below_avg2' list.
print(pg13_below_avg2)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
Getting the average Return On Investment Percentage of all the 'PG-13' rated movies in the Drama genre.
pg13_avg_value = sum([int(i.replace('$', '').replace(',', ''))
for i in system_rating_pg13['Return On Investment']]) / len(system_rating_pg13['Cost'])
The average Return On Investment of all the 'PG-13' rated Drama movies is $79,724,974.
pg13_avg_value
79724974.890625
Getting the index of all the movies that are below the average Return On Investment of all the 'PG-13' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'PG-13' rated Drama mvoies.
pg13_roi_index = [int(i.replace('$', '').replace(',', ''))
for i in system_rating_pg13['Return On Investment']]
#below ayg
pg13_below_avg3 = []
for i,x in enumerate(pg13_roi_index):
if x <= 79724974:pg13_below_avg3.append(i)
pg13_below_avg4 = []
for i,x in enumerate(pg13_roi_index):
if x >= 79724974:pg13_below_avg4.append(i)
The 'pg13_below_avg3' list.
print(pg13_below_avg3)
[2, 3, 6, 7, 9, 12, 14, 15, 17, 18, 21, 22, 23, 26, 27, 28, 29, 30, 32, 33, 35, 36, 37, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61, 62, 63]
The 'pg13_below_avg4' list.
print(pg13_below_avg4)
[0, 1, 4, 5, 8, 10, 11, 13, 16, 19, 20, 24, 25, 31, 34, 38, 55]
Getting the average Return On Investment Percentage of all the 'PG-13' rated movies in the Drama genre.
pg13_avg_value = sum([int(i.replace('%', ''))
for i in system_rating_pg13['ROI Percentage']]) / len(system_rating_pg13['Cost'])
The average Return On Investment Percentage of all the 'PG-13' rated Drama movies is 414%.
pg13_avg_value
414.4375
Getting the index of all the movies that were below the average Return On Investment Percentage of all the 'PG-13' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment Percentage of all the 'PG-13' rated Drama mvoies.
roi_percent_index_pg13 = [int(i.replace('%', ''))
for i in system_rating_pg13['ROI Percentage']]
#below ayg
pg13_above_avg = []
for i,x in enumerate(roi_percent_index_pg13):
if x >= 414:pg13_above_avg.append(i)
The 'pg13_above_avg' list.
print(pg13_above_avg)
[0, 1, 16, 20, 24, 25, 31, 34, 38, 45, 46, 47, 50, 53, 55, 56, 59, 60, 62, 63]
Styling Syetem_rating_pg131 using the eight functions and the indexes to do so.
def Ratings48(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(22):
df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
df.iloc[i,-4] = "font-size : 8pt"
return df
def Ratings_highlight49(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(22):
df.iloc[i,-1] = 'color:#FFD700;background-color:white'
df.iloc[i,0] = 'color:#DE3163;background-color:white;font-size:8pt;font-weight: bold'
df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
df.iloc[i,1] = 'font-size:4pt'
return df
def highlight_cells50(x):
df = x.copy()
df.loc[:,:] = ''
for i in pg13_below_avg1:
df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
return df
def highlight_cells51(x):
df = x.copy()
df.loc[:,:] = ''
for i in pg13_below_avg2[:22]:
df.iloc[i,1] = 'background-color:#DE3163;color:white;border-bottom:2px solid black'
return df
def highlight_cells52(x):
df = x.copy()
df.loc[:,:] = ''
for i in pg13_below_avg3[:11]:
df.iloc[i,2] = 'background-color:#DE3163;color:white;border-bottom: 2px solid black'
return df
def highlight_cells53(x):
df = x.copy()
df.loc[:,:] = ''
for i in pg13_below_avg4[:11]:
df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells54(x):
df = x.copy()
df.loc[:,:] = ''
for i in pg13_above_avg[:4]:
df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
return df
def borders(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,0] = 'border-right: 6px solid blue'
df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[15:18,0] = 'border-right: 6px solid blue'
df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
df.iloc[22,0] = 'border-right: 6px solid blue'
df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
return df
system_rating_pg131=system_rating_pg131.style.apply(Ratings_highlight49, axis=None)\
.set_table_styles([{'selector' : '','props' : [('border','3px solid #DE3163')]},
{"selector":"thead", 'props':[("background-color","white"),("color","#DE3163")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','#DE3163')]}#index
])\
.apply(Ratings48, axis=None)\
.apply(highlight_cells51, axis=None)\
.apply(highlight_cells52, axis=None)\
.apply(highlight_cells53, axis=None)\
.apply(highlight_cells54, axis=None)\
#.apply(highlight_cells50, axis=None)\
#.set_table_attributes("style='display:inline'")\
#.set_caption('Caption table 1')
The 'Syetem_rating_pg131' datarame.
Styling Syetem_rating_pg132 using the eight functions and the indexes to do so.
def Ratings48(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(20):
df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
df.iloc[i,-4] = "font-size : 8pt"
return df
def Ratings_highlight49(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(20):
df.iloc[i,-1] = 'color:#FFD700;background-color:white'
df.iloc[i,0] = 'color:#DE3163;background-color:white;font-size:8pt;font-weight: bold'
df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
df.iloc[i,1] = 'font-size:4pt'
return df
def highlight_cells50(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(8,20):
df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
return df
def highlight_cells51(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(8):
df.iloc[i,1] = 'background-color:#DE3163;color:white;border-bottom:2px solid black'
return df
def highlight_cells52(x):
df = x.copy()
df.loc[:,:] = ''
for i in [0, 1, 4, 5, 6, 7, 8, 10, 11, 13, 14, 15, 17, 18, 19]:
df.iloc[i,2] = 'background-color:#DE3163;color:white;border-bottom: 2px solid black'
return df
def highlight_cells53(x):
df = x.copy()
df.loc[:,:] = ''
for i in [2, 3, 9, 16, 12]:
df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells54(x):
df = x.copy()
df.loc[:,:] = ''
for i in [2, 3, 9, 16, 12]:
df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
return df
def borders(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,0] = 'border-right: 6px solid blue'
df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[15:18,0] = 'border-right: 6px solid blue'
df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
df.iloc[22,0] = 'border-right: 6px solid blue'
df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
return df
system_rating_pg132=system_rating_pg132.style.apply(Ratings_highlight49, axis=None)\
.set_table_styles([{'selector' : '','props' : [('border','3px solid #DE3163')]},
{"selector":"thead", 'props':[("background-color","white"),("color","#DE3163")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','#DE3163')]}#index
])\
.apply(Ratings48, axis=None)\
.apply(highlight_cells51, axis=None)\
.apply(highlight_cells52, axis=None)\
.apply(highlight_cells53, axis=None)\
.apply(highlight_cells54, axis=None)\
.apply(highlight_cells50, axis=None)\
#.set_table_attributes("style='display:inline'")\
#.set_caption('Caption table 1')
The 'Syetem_rating_pg132' datarame.
Styling Syetem_rating_pg133 using the eight functions and the indexes to do so.
def Ratings48(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(22):
df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
df.iloc[i,-4] = "font-size : 8pt"
return df
def Ratings_highlight49(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(22):
df.iloc[i,-1] = 'color:#FFD700;background-color:white'
df.iloc[i,0] = 'color:#DE3163;background-color:white;font-size:8pt;font-weight: bold'
df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
df.iloc[i,1] = 'font-size:4pt'
return df
def highlight_cells50(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(22):
df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom:2px solid black'
return df
def highlight_cells51(x):
df = x.copy()
df.loc[:,:] = ''
for i in pg13_below_avg2[:22]:
df.iloc[i,1] = 'background-color:#DE3163;color:white;border-bottom:2px solid black'
return df
def highlight_cells52(x):
df = x.copy()
df.loc[:,:] = ''
for i in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]:
df.iloc[i,2] = 'background-color:#DE3163;color:white;border-bottom: 2px solid black'
return df
def highlight_cells53(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[11,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells54(x):
df = x.copy()
df.loc[:,:] = ''
for i in [3, 4, 5, 8, 11, 13, 14, 17, 18, 20, 21 ]:
df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
return df
def borders(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,0] = 'border-right: 6px solid blue'
df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[15:18,0] = 'border-right: 6px solid blue'
df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
df.iloc[22,0] = 'border-right: 6px solid blue'
df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
return df
system_rating_pg133 = system_rating_pg133.style.apply(Ratings_highlight49, axis=None)\
.set_table_styles([{'selector' : '','props' : [('border','3px solid #DE3163')]},
{"selector":"thead", 'props':[("background-color","white"),("color","#DE3163")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','#DE3163')]}#index
])\
.apply(Ratings48, axis=None)\
.apply(highlight_cells50, axis=None)\
.apply(highlight_cells52, axis=None)\
.apply(highlight_cells53, axis=None)\
.apply(highlight_cells54, axis=None)\
#.apply(highlight_cells51, axis=None)\
#.set_table_attributes("style='display:inline'")\
#.set_caption('Caption table 1')
The 'Syetem_rating_pg133' datarame.
Saving the System_rating_pg131 dataframe to the System_rating_pg131.png file as an image to be used for the analysis later on.
dfi.export(system_rating_pg131, 'system_rating_pg131.png')
Saving the System_rating_pg132 dataframe to the System_rating_pg132.png file as an image to be used for the analysis later on.
dfi.export(system_rating_pg132, 'system_rating_pg132.png')
Saving the System_rating_pg133 dataframe to the System_rating_pg133.png file as an image to be used for the analysis later on.
dfi.export(system_rating_pg133, 'system_rating_pg133.png')
This allows all the three dataframes to be displayed side by side.
def display_side_by_side4(*args):
html_str = "<center><font size=6 style='color:#DE3163'>The Return On Investement on PG13-rated Movies.</font></center> <br> "
for df in args:
html_str += df.to_html()
display_html(
html_str.replace('table','table style="display:inline"'),
raw=True
)
Below will be the creation of dataframes that are in the 'Drama Genre' that are 'NC-17 rated' based on the 'ROI' of each movie.
Index of all the 'NC-17' rated movies.
nc17_index = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'NC-17':nc17_index.append(i)
print(nc17_index) #showing the nc17_index list
[112, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305]
Checking the number of elements in the 'nc17_index' list.
len(nc17_index)
49
Getiing the Profit for all 'NC-17' rated movies.
nc17_profit = []
for i in nc17_index:
nc17_profit.append(Drama_DataFrame.Profit[i])
print(nc17_profit) #showing the nc17_profit list
[13912841.0, 4856268.0, 8404.0, 257845.0, 659312.0, 18912216.0, -24649246.0, 89410061.0, -4503941.0, 121165.0, -1712236.0, 52091915.0, 13912841.0, 15465835.0, -7249246.0, 307113.0, 13912841.0, 15390895.0, 15566240.0, 1315026.0, 256669.0, 201120004.0, -5340890.0, 50167430.0, -17763156.0, 2311944.0, -794431.0, 13912841.0, -2605698.0, 2548651.0, -216465.0, -596907.0, 16283563.0, 3664240.0, 1038916.0, -2509128.0, 8000000.0, 18912216.0, 94673038.0, 34897711.0, 401802.0, 50167430.0, 3546453.0, -672713.0, -13085834.0, -3838180.0, 958404.0, -2237424.0, 858737.0]
Checking the number of elements in the 'nc17_profit' list.
len(nc17_profit)
49
Getiing the Cost for all 'NC-17' rated movies.
nc17_cost = []
for i in nc17_index:
nc17_cost.append(Drama_DataFrame.Production_Budget_x[i])
print(nc17_cost) #showing the nc17_cost list
['$6,500,000', '$12,500,000', '$1,000,000', '$20,000', '$955,472', '$1,500,000', '$45,000,000', '$9,000,000', '$5,000,000', '$15,000,000', '$2,734,384', '$15,000,000', '$6,500,000', '$4,000,000', '$45,000,000', '$15,000,000', '$6,500,000', '$4,074,940', '$1,000,000', '$1,000,000', '$3,565,572', '$12,000,000', '$10,000,000', '$15,000,000', '$19,000,000', '$350,000', '$1,000,000', '$6,500,000', '$4,700,000', '$904,765', '$3,000,000', '$700,000', '$34,000,000', '$230,000', '$1,000,000', '$3,200,000', '$1,000,000', '$1,500,000', '$6,500,000', '$1,250,000', '$12,000', '$15,000,000', '$2,200,000', '$1,300,000', '$15,000,000', '$6,400,000', '$50,000', '$3,259,572', '$612,072']
Checking the number of elements in the 'nc17_cost' list.
len(nc17_cost)
49
Getiing the Name for all 'NC-17' rated movies.
nc17_name = []
for i in nc17_index:
nc17_name.append(Drama_DataFrame.Movie[i])
print(nc17_name) #showing the nc17_name list
['Shame', 'Matador', 'Whore', 'Tokyo Decadence', 'Wide Sargasso Sea', 'Kids', 'Showgirls', 'Crash', 'Bent', 'The Dreamers', 'Ma mère', 'Lust, Caution', 'Shame', 'Blue Is the Warmest Colour', 'Showgirls', 'The Dreamers', 'Shame', 'Blue Is the Warmest Colour', 'Blue Valentine', 'Two Girls and a Guy', 'Elles', 'Hell', 'Killer Joe', 'Se, jie', 'Queen of Hearts', 'The Evil Dead', 'Man Bites Dog', 'Shame', 'Nymphomaniac: Vol. I', 'Arabian Nights', 'Frontier(s)', 'Chained', 'Natural Born Killers', 'Clerks', 'Bad Lieutenant', 'The Big Feast', 'Beyond the Valley of the Dolls', 'Kids', 'Crash', 'Last Tango in Paris', 'Pink Flamingos', 'Lust, Caution ', 'Happiness 1998', 'Orgazmo', 'A Dirty Shame', 'Young Adam', 'Whore 1991', 'Ma Mère', 'Law of Desire']
Checking the number of elements in the 'nc17_name' list.
len(nc17_name)
49
Getiing the ROI for all 'NC-17' rated movies.
nc17_return_on_investment = []
for i in nc17_index:
nc17_return_on_investment.append(Drama_DataFrame.Profit_x[i])
print(nc17_return_on_investment) #showing the nc17_return_on_investment list
['$13,912,841', '$4,856,268', '$8,404', '$257,845', '$659,312', '$18,912,216', '$-24,649,246', '$89,410,061', '$-4,503,941', '$121,165', '$-1,712,236', '$52,091,915', '$13,912,841', '$15,465,835', '$-7,249,246', '$307,113', '$13,912,841', '$15,390,895', '$15,566,240', '$1,315,026', '$256,669', '$201,120,004', '$-5,340,890', '$50,167,430', '$-17,763,156', '$2,311,944', '$-794,431', '$13,912,841', '$-2,605,698', '$2,548,651', '$-216,465', '$-596,907', '$16,283,563', '$3,664,240', '$1,038,916', '$-2,509,128', '$8,000,000', '$18,912,216', '$94,673,038', '$34,897,711', '$401,802', '$50,167,430', '$3,546,453', '$-672,713', '$-13,085,834', '$-3,838,180', '$958,404', '$-2,237,424', '$858,737']
Checking the number of elements in the 'nc17_return_on_investment' list.
len(nc17_return_on_investment)
49
Getiing the Ratings of all 'NC-17' rated movies.
nc17_rating = []
for i in nc17_index:
nc17_rating.append(Drama_DataFrame.Averagerating[i])
print(nc17_rating) #showing the nc17_rating list
[7.2, 7.0, 5.6, 6.0, 5.7, 7.1, 4.9, 6.4, 7.2, 7.2, 5.1, 7.5, 7.2, 7.7, 4.9, 7.1, 7.2, 7.7, 7.4, 5.5, 5.6, 5.9, 6.7, 7.5, 7.1, 7.4, 7.4, 7.2, 6.9, 6.7, 6.2, 6.4, 7.2, 7.7, 7.0, 6.2, 6.1, 7.0, 7.8, 6.9, 6.0, 7.5, 7.7, 6.1, 5.1, 6.4, 5.5, 5.0, 7.1]
Checking the number of elements in the 'nc17_rating' list.
len(nc17_rating)
49
Getiing the Profit Percentage of all 'NC-17' rated movies.
nc17_percent_profit = []
for i in nc17_index:
i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
nc17_percent_profit.append(int(round(i,0)))
print(nc17_percent_profit) #showing the nc17_percent_profit list
[214, 39, 1, 1289, 69, 1261, -55, 993, -90, 1, -63, 347, 214, 387, -16, 2, 214, 378, 1557, 132, 7, 1676, -53, 334, -93, 661, -79, 214, -55, 282, -7, -85, 48, 1593, 104, -78, 800, 1261, 1457, 2792, 3348, 334, 161, -52, -87, -60, 1917, -69, 140]
Checking the number of elements in the 'nc17_percent_profit' list.
len(nc17_percent_profit)
49
Converting integer of the ROI values to percentage of all 'NC-17' rated movies.
nc17_roi_percent = []
for i in nc17_percent_profit:
nc17_roi_percent.append("{:}%".format(i))
print(nc17_roi_percent) #showing the nc17_roi_percent list
['214%', '39%', '1%', '1289%', '69%', '1261%', '-55%', '993%', '-90%', '1%', '-63%', '347%', '214%', '387%', '-16%', '2%', '214%', '378%', '1557%', '132%', '7%', '1676%', '-53%', '334%', '-93%', '661%', '-79%', '214%', '-55%', '282%', '-7%', '-85%', '48%', '1593%', '104%', '-78%', '800%', '1261%', '1457%', '2792%', '3348%', '334%', '161%', '-52%', '-87%', '-60%', '1917%', '-69%', '140%']
Checking the number of elements in the 'nc17_roi_percent' list.
len(nc17_roi_percent)
49
Turning the integer of the star rating of each movie into a star of all 'NC-17' rated movies.
nc17_stars = []
for i in nc17_rating:
nc17_stars.append('*'*int(i))
print(nc17_stars) #showing the nc17_stars list
['*******', '*******', '*****', '******', '*****', '*******', '****', '******', '*******', '*******', '*****', '*******', '*******', '*******', '****', '*******', '*******', '*******', '*******', '*****', '*****', '*****', '******', '*******', '*******', '*******', '*******', '*******', '******', '******', '******', '******', '*******', '*******', '*******', '******', '******', '*******', '*******', '******', '******', '*******', '*******', '******', '*****', '******', '*****', '*****', '*******']
Checking the number of elements in the 'nc17_stars' list.
len(nc17_stars)
49
Createing the 'NC-17' rated dataframe with the variables previously created.
system_rating_nc17 = pd.DataFrame({"Name of Movie":nc17_name, "Cost":nc17_cost,
"Return On Investment":nc17_return_on_investment,
"ROI Percentage":nc17_roi_percent,"Ratings":nc17_stars})
The 'system_rating_nc17' dataframe. (this dataframe is interactive)
system_rating_nc17
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
Getting the index of all the negative values.
neg_values = []
for i,x in enumerate(nc17_percent_profit):
if x <= 0: neg_values.append(i)
print(neg_values) #showing the neg_values list
[6, 8, 10, 14, 22, 24, 26, 28, 30, 31, 35, 43, 44, 45, 47]
Checking the number of elements in the 'neg_values' list.
len(neg_values)
15
Dropping the negative values and resetting the index of the system_rating_pg dataframe.
system_rating_nc17 = system_rating_nc17.drop(labels=neg_values, axis=0)
system_rating_nc17 = system_rating_nc17.reset_index(drop=True)
The new 'system_rating_pg13' dataframe. It will be divided into two dataframes. (this dataframe is interactive)
system_rating_nc17
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
System_rating_nc171 is the first dataframe. (this dataframe is interactive)
system_rating_nc171=system_rating_nc17[:17]
system_rating_nc171
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
System_rating_nc172 is the second dataframe. (this dataframe is interactive)
system_rating_nc172=system_rating_nc17[17:]
system_rating_nc172
| Name of Movie | Cost | Return On Investment | ROI Percentage | Ratings |
|---|---|---|---|---|
| Loading... (need help?) |
Getting the average Budget of all the 'NC-17' rated movies in the Drama genre.
nc_avg_value = sum([int(i.replace('$', '').replace(',', ''))
for i in system_rating_nc17['Cost']]) / len(system_rating_nc17['Cost'])
The average Budget of all the 'NC-17' rated Drama movies is $5,918,377.
nc_avg_value
5918377.088235294
Getting the index of all the movies that are below the average Return On Investment of all the 'NC-17' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'NC-17' rated Drama mvoies.
nc_cost_index = [int(i.replace('$', '').replace(',', ''))for i in system_rating_nc17['Cost']]
#below ayg
nc_below_avg1 = []
for i,x in enumerate(nc_cost_index):
if x <= 5918377:nc_below_avg1.append(i)
nc_below_avg2 = []
for i,x in enumerate(nc_cost_index):
if x >= 5918377:nc_below_avg2.append(i)
The 'nc_below_avg1' list.
print(nc_below_avg1)
[2, 3, 4, 5, 10, 13, 14, 15, 16, 19, 21, 23, 24, 25, 26, 28, 29, 31, 32, 33]
The 'nc_below_avg2' list.
print(nc_below_avg2)
[0, 1, 6, 7, 8, 9, 11, 12, 17, 18, 20, 22, 27, 30]
Getting the average Return On Investment Percentage of all the 'NC-17' rated movies in the Drama genre.
nc_avg_value = sum([int(i.replace('$', '').replace(',', ''))
for i in system_rating_nc17['Return On Investment']]) / len(system_rating_nc17['Cost'])
The average Return On Investment of all the 'NC-17' rated Drama movies is $2,2347,672.
nc_avg_value
22347672.55882353
Getting the index of all the movies that are below the average Return On Investment of all the 'NC-17' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment of all the 'NC-17' rated Drama mvoies.
nc_roi_index = [int(i.replace('$', '').replace(',', ''))
for i in system_rating_nc17['Return On Investment']]
#below ayg
nc_below_avg3 = []
for i,x in enumerate(nc_roi_index):
if x <= 22347672:nc_below_avg3.append(i)
nc_below_avg4 = []
for i,x in enumerate(nc_roi_index):
if x >= 22347672:nc_below_avg4.append(i)
The 'nc_below_avg3' list.
print(nc_below_avg3)
[0, 1, 2, 3, 4, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 21, 22, 23, 24, 25, 26, 29, 31, 32, 33]
The 'nc_below_avg4' list.
print(nc_below_avg4)
[6, 8, 17, 18, 27, 28, 30]
Getting the average Return On Investment Percentage of all the 'NC-17' rated movies in the Drama genre.
nc_avg_value = sum([int(i.replace('%', ''))
for i in system_rating_nc17['ROI Percentage']]) / len(system_rating_nc17['Cost'])
The average Return On Investment Percentage of all the 'NC-17' rated Drama movies is 712%.
nc_avg_value
712.5588235294117
Getting the index of all the movies that were below the average Return On Investment Percentage of all the 'NC-17' rated Drama movies. Getting the index of all the movies that are above the average Return On Investment Percentage of all the 'NC-17' rated Drama mvoies.
roi_percent_index_nc = [int(i.replace('%', ''))for i in system_rating_nc17['ROI Percentage']]
#below ayg
nc_above_avg = []
for i,x in enumerate(roi_percent_index_nc):
if x >= 712:nc_above_avg.append(i)
The 'nc_above_avg' list.
print(nc_above_avg)
[3, 5, 6, 14, 17, 23, 25, 26, 27, 28, 29, 32]
Styling Syetem_rating_nc171 using the eight functions and the indexes to do so.
def Ratings1(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(17):
df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
df.iloc[i,-4] = "font-size : 8pt"
return df
def Ratings_highlight2(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(17):
df.iloc[i,-1] = 'color:#FFD700;background-color:white'
df.iloc[i,0] = 'color:#581845;background-color:white;font-size:8pt;font-weight: bold'
df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
return df
def highlight_cells3(x):
df = x.copy()
df.loc[:,:] = ''
for i in nc_below_avg1[:9]:
df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells4(x):
df = x.copy()
df.loc[:,:] = ''
for i in nc_below_avg2[:8]:
df.iloc[i,1] = 'background-color:#581845;color:white;border-bottom: 2px solid black'
return df
def highlight_cells5(x):
df = x.copy()
df.loc[:,:] = ''
for i in nc_below_avg3[:15]:
df.iloc[i,2] = 'background-color:#581845;color:white;border-bottom: 2px solid black'
return df
def highlight_cells6(x):
df = x.copy()
df.loc[:,:] = ''
for i in nc_below_avg4[:2]:
df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells7(x):
df = x.copy()
df.loc[:,:] = ''
for i in nc_above_avg[:4]:
df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
return df
def borders(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,0] = 'border-right: 6px solid blue'
df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[15:18,0] = 'border-right: 6px solid blue'
df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
df.iloc[22,0] = 'border-right: 6px solid blue'
df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
return df
system_rating_nc171 = system_rating_nc171.style.apply(Ratings_highlight2, axis=None)\
.set_table_styles([{'selector' : '','props' : [('border','3px solid #581845')]},
{"selector":"thead", 'props':[("background-color","white"),("color","#581845")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','#581845')]}
])\
.apply(Ratings1, axis=None)\
.apply(highlight_cells3, axis=None)\
.apply(highlight_cells4, axis=None)\
.apply(highlight_cells5, axis=None)\
.apply(highlight_cells6, axis=None)\
.apply(highlight_cells7, axis=None)\
#.set_table_attributes("style='display:inline'")\
#.set_caption('Caption table 1')
Saving the System_rating_nc171 dataframe to the System_rating_nc171.png file as an image to be used for the analysis later on.
dfi.export(system_rating_nc171, 'system_rating_nc171.png')
The 'Syetem_rating_nc171' datarame.
Styling Syetem_rating_nc172 using the eight functions and the indexes to do so.
def Ratings8(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(17):#range(19,37):
df.iloc[i,-1] = "font-size : 11pt;font-weight: bold"
df.iloc[i,-4] = "font-size : 8pt"
return df
def Ratings_highlight9(x):
df = x.copy()
df.loc[:,:] = ''
for i in range(17):
df.iloc[i,-1] = 'color:#FFD700;background-color:white'
df.iloc[i,0] = 'color:#581845;background-color:white;font-size:8pt;font-weight: bold'
df.iloc[i,2:4] = 'color:black;background-color:white;font-size:8pt'
return df
def highlight_cells10(x):
df = x.copy()
df.loc[:,:] = ''
for i in [2, 3, 4, 6, 7, 8, 9, 11, 12, 14, 15, 16]:
df.iloc[i,1] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells90(x):
df = x.copy()
df.loc[:,:] = ''
for i in [0, 1, 5, 10, 13]:
df.iloc[i,1] = 'background-color:#581845;color:white;border-bottom: 2px solid black'
return df
def highlight_cells11(x):
df = x.copy()
df.loc[:,:] = ''
for i in [2, 3, 4, 5, 6, 7, 8, 9, 12, 14, 15, 16]:
df.iloc[i,2] = 'background-color:#581845;color:white;border-bottom: 2px solid black'
return df
def highlight_cells12(x):
df = x.copy()
df.loc[:,:] = ''
for i in [0, 1, 10, 11, 13]:
df.iloc[i,2] = 'background-color:yellow;color:black;border-bottom: 2px solid black'
return df
def highlight_cells13(x):
df = x.copy()
df.loc[:,:] = ''
for i in [0, 6, 8, 9, 10, 11, 12, 15]:
df.iloc[i,3] = 'background-color:#16F529;color:black;border-bottom: 2px solid black'
return df
def borders(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[10:14,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,0] = 'border-right: 6px solid blue'
df.iloc[14:18,1:4] = 'border-bottom: 6px solid blue'
df.iloc[11:14,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[15:18,0] = 'border-right: 6px solid blue'
df.iloc[15:18,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
df.iloc[21:23,1:4] = 'border-bottom: 6px solid blue'
df.iloc[22,0] = 'border-right: 6px solid blue'
df.iloc[22:23,-2] = 'border-bottom: 6px solid blue;border-right: 6px solid blue'
return df
system_rating_nc172 = system_rating_nc172.style.apply(Ratings_highlight9, axis=None)\
.set_table_styles([{'selector' : '','props' : [('border','3px solid #581845')]},
{"selector":"thead", 'props':[("background-color","white"),("color","#581845")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','#581845')]}#index
])\
.apply(Ratings8, axis=None)\
.apply(highlight_cells10, axis=None)\
.apply(highlight_cells90, axis=None)\
.apply(highlight_cells11, axis=None)\
.apply(highlight_cells12, axis=None)\
.apply(highlight_cells13, axis=None)\
#.set_table_attributes("style='display:inline'")\
#.set_caption('Caption table 2')
#.apply(borders, axis=None)
#display_html(df1_style._repr_html_() + df2_style._repr_html_(), raw=True)
Saving the System_rating_nc172 dataframe to the System_rating_nc172.png file as an image to be used for the analysis later on.
dfi.export(system_rating_nc172, 'system_rating_nc172.png')
The 'Syetem_rating_nc172' datarame.
This allows all the three dataframes to be displayed side by side.
def display_side_by_side5(*args):
html_str = "<center><font size=6 style='color:#FF2400'>The Return On Investement on NC-17 Rated Movies.</font></center> <br> "
for df in args:
html_str += df.to_html()
display_html(
html_str.replace('table','table style="display:inline"'),
raw=True
)
Getting the RIO of all the 'R' rated movies.
RIO_R = []
for i in r_percent_profit:
i /= 10
RIO_R.append(i)
RIO_R.sort(reverse=True)
print(RIO_R) #showing the RIO_R list
[267.0, 244.8, 185.2, 155.7, 133.2, 132.7, 108.1, 105.6, 97.0, 85.0, 81.5, 81.3, 80.8, 73.1, 70.7, 67.5, 60.1, 59.3, 57.5, 50.4, 50.1, 46.5, 44.4, 41.8, 41.1, 40.8, 35.0, 26.3, 25.8, 25.0, 23.8, 21.8, 21.6, 19.9, 19.5, 17.9, 15.6, 13.3, 13.2, 8.1, 7.4, 6.5, 5.4, 4.4, 4.1, 4.0, 3.8, 3.6, 3.5, 2.2, 2.1, 1.6, 1.3, 0.5, 0.4, 0.0, -0.5, -0.9, -1.5, -2.0, -2.6, -2.9, -4.4, -4.9, -5.3, -5.7, -5.8, -6.2, -7.3, -7.5, -7.8, -8.1, -8.2, -8.3, -9.5, -9.6, -9.8]
Checking the number of elements in the 'RIO_R' list.
len(RIO_R[:-22])
55
Changing the RIO from interger to currency(dollars).
currency_R = []
for i in RIO_R[:-22]:
currency_R.append("${:,.2f}".format(i))
print(currency_R) #showing the currency_R list
['$267.00', '$244.80', '$185.20', '$155.70', '$133.20', '$132.70', '$108.10', '$105.60', '$97.00', '$85.00', '$81.50', '$81.30', '$80.80', '$73.10', '$70.70', '$67.50', '$60.10', '$59.30', '$57.50', '$50.40', '$50.10', '$46.50', '$44.40', '$41.80', '$41.10', '$40.80', '$35.00', '$26.30', '$25.80', '$25.00', '$23.80', '$21.80', '$21.60', '$19.90', '$19.50', '$17.90', '$15.60', '$13.30', '$13.20', '$8.10', '$7.40', '$6.50', '$5.40', '$4.40', '$4.10', '$4.00', '$3.80', '$3.60', '$3.50', '$2.20', '$2.10', '$1.60', '$1.30', '$0.50', '$0.40']
Checking the number of elements in the 'currency_R' list.
len(currency_R)
55
Getting the Mean of RIO of all the 'R' rated movies.
avg_R = statistics.mean(RIO_R[:-22])
avg_R
50.88727272727272
Getting the 25th, 50th and the 75th percentiles of the RIO of all the 'R' rated movies.
np.percentile(RIO_R[:-22], [25,50,75])
array([ 6.95, 26.3 , 71.9 ])
Getting the Name of all the 'R' rated movies to create the dataframe_RIO_r dataframe.
final_name_r = []
reversed_name = []
for x,i in enumerate(r_name):
reversed_name.append((r_percent_profit[x], i))
reversed_name.sort(reverse = True)
for i in reversed_name[:-22]:
final_name_r.append(i[1])
print(final_name_r) #showing the final_name_r list
['A Ghost Story', 'Black Swan', 'Ghost Story', 'Blue Valentine', 'Boyhood', 'Fifty Shades of Grey', 'Whiplash', 'The Witch', 'Buried', 'Unsane', 'Manchester by the Sea', 'Ordinary People', 'Fame', 'Silent House', "Winter's Bone", 'Before Midnight', 'Hereditary', 'Fifty Shades Darker', 'Fifty Shades Freed', 'Gone Girl', 'Margin Call', 'The Florida Project', 'Martha Marcy May Marlene', 'Flight', 'Quartet', 'We Are Your Friends', 'Django Unchained', 'Carol', 'Mommy', 'Addicted', 'The Ides of March', 'Sound of My Voice', 'Knock Knock', 'Arbitrage', 'Ex Machina', 'Room', 'Zero Dark Thirty', 'The Debt', 'Melancholia', 'For Colored Girls', 'Endless Love', 'If Beale Street Could Talk', 'We Need to Talk About Kevin', 'Nocturnal Animals', 'Let Me In', 'Priest', 'The Water Diviner', 'Crimson Peak', 'The Master', 'Raggedy Man', 'Zoot Suit', 'Palo Alto', 'Rich and Famous', 'Take Shelter', 'Locke']
Checking the number of elements in the 'final_name_r' list.
len(final_name_r)
55
The dataframe_RIO_r dataframe is created.
dataframe_RIO_r = pd.DataFrame({"Name of Movie":final_name_r,
"Money Generated for Every $1 Spent":currency_R})
The 'dataframe_RIO_r' dataframe. (this dataframe is interactive)
dataframe_RIO_r
| Name of Movie | Money Generated for Every $1 Spent |
|---|---|
| Loading... (need help?) |
Getting the RIO of all the 'G' rated movies.
RIO_G = []
for i in g_percent_profit:
i /= 10
RIO_G.append(i)
RIO_G.sort(reverse=True)
print(RIO_G) #showing the RIO_G list
[3113.5, 339.0, 209.3, 209.2, 162.9, 130.0, 72.0, 60.6, 57.8, 40.4, 37.7, 37.2, 36.5, 33.4, 32.4, 26.7, 26.6, 24.4, 19.1, 16.6, 8.3, 7.7, 6.9, 5.1, 5.0, -2.3, -4.1, -4.1, -5.2, -6.4, -8.0, -8.8, -9.9, -9.9]
Checking the number of elements in the 'RIO_G' list.
len(RIO_G[:-9])
25
Changing the RIO from interger to currency(dollars).
currency_G = []
for i in RIO_G[:-9]:
currency_G.append("${:,.2f}".format(i))
print(currency_G) #showing the currency_G list
['$3,113.50', '$339.00', '$209.30', '$209.20', '$162.90', '$130.00', '$72.00', '$60.60', '$57.80', '$40.40', '$37.70', '$37.20', '$36.50', '$33.40', '$32.40', '$26.70', '$26.60', '$24.40', '$19.10', '$16.60', '$8.30', '$7.70', '$6.90', '$5.10', '$5.00']
Checking the number of elements in the 'currency_G' list.
len(currency_G)
25
Getting the Mean of RIO of all the 'G' rated movies.
avg_G = statistics.mean(RIO_G[:-9])
avg_G
188.732
Getting the 25th, 50th and the 75th percentiles of the RIO of all the 'G' rated movies.
np.percentile(RIO_G[:-9], [25,50,75])
array([19.1, 36.5, 72. ])
Getting the Name of all the 'G' rated movies to create the dataframe_RIO_g dataframe.
final_name_g = []
reversed_name = []
for x,i in enumerate(g_name):
reversed_name.append((g_percent_profit[x], i))
reversed_name.sort(reverse = True)
for i in reversed_name[:-9]:
final_name_g.append(i[1])
print(final_name_g) #showing the final_name_g list
['Bambi 1942', 'The Sound of Music', 'Beauty and the Beast 1991', 'The Lion King 1994', 'The Secret Garden', 'The Black Stallion', 'Babe', 'Three Cions in the Fountain', 'Lassie Come Home', 'The Ten Commandments 1966', "Hachiko: A Dog's Story", 'Giant', 'The Hunchback of Notre Drame', 'The Quiet Man', 'My Fair Lady 1964', 'The Rookie', 'The Rookie', 'A Sunday in the Country', 'The Little Rascals', 'Prancer', 'Ramona and Beezus', 'Kit Kittredge: An American Girl', "Charlotte's Web", 'The Tale of Despereaux', 'Pollyanna']
Checking the number of elements in the 'final_name_g' list.
len(final_name_g)
25
The dataframe_RIO_g dataframe is created.
dataframe_RIO_g = pd.DataFrame({"Name of Movie":final_name_g,
"Money Generated for Every $1 Spent":currency_G})
The 'dataframe_RIO_g' dataframe. (this dataframe is interactive)
dataframe_RIO_g
| Name of Movie | Money Generated for Every $1 Spent |
|---|---|
| Loading... (need help?) |
Getting the RIO of all the 'PG' rated movies.
RIO_PG = []
for i in pg_percent_profit:
i /= 10
RIO_PG.append(i)
RIO_PG.sort(reverse=True)
print(RIO_PG) #showing the RIO_PG list
[1088.7, 659.5, 632.6, 313.3, 236.6, 236.6, 214.7, 143.0, 142.3, 87.6, 74.1, 70.9, 69.5, 66.2, 63.6, 62.0, 54.0, 47.1, 44.2, 37.5, 36.9, 36.3, 36.2, 34.6, 32.6, 30.2, 26.2, 25.6, 23.9, 19.5, 18.7, 16.0, 15.6, 13.7, 11.5, 10.3, 9.9, 9.0, 9.0, 8.8, 7.1, 5.2, 3.8, 2.9, 2.1, 0.0, -0.4, -0.5, -2.1, -2.2, -2.6, -3.0, -3.3, -4.5, -4.7, -4.8, -4.9, -5.5, -5.7, -5.9, -6.2, -7.0, -7.3, -8.1, -8.9, -9.0, -9.9]
Checking the number of elements in the 'RIO_PG' list.
len(RIO_PG[:-22])
45
Changing the RIO from interger to currency(dollars).
currency_PG = []
for i in RIO_PG[:-22]:
currency_PG.append("${:,.2f}".format(i))
print(currency_PG) #showing the currency_PG list
['$1,088.70', '$659.50', '$632.60', '$313.30', '$236.60', '$236.60', '$214.70', '$143.00', '$142.30', '$87.60', '$74.10', '$70.90', '$69.50', '$66.20', '$63.60', '$62.00', '$54.00', '$47.10', '$44.20', '$37.50', '$36.90', '$36.30', '$36.20', '$34.60', '$32.60', '$30.20', '$26.20', '$25.60', '$23.90', '$19.50', '$18.70', '$16.00', '$15.60', '$13.70', '$11.50', '$10.30', '$9.90', '$9.00', '$9.00', '$8.80', '$7.10', '$5.20', '$3.80', '$2.90', '$2.10']
Checking the number of elements in the 'currency_PG' list.
len(currency_PG)
45
Getting the Mean of RIO of all the 'PG' rated movies.
avg_PG = statistics.mean(RIO_PG[:-22])
avg_PG
106.43555555555555
Getting the 25th, 50th and the 75th percentiles of the RIO of all the 'PG' rated movies.
np.percentile(RIO_PG[:-22], [25,50,75])
array([13.7, 36.2, 70.9])
Getting the Name of all the 'PG' rated movies to create the dataframe_RIO_pg dataframe.
final_name_pg = []
reversed_name = []
for x,i in enumerate(pg_name):
reversed_name.append((pg_percent_profit[x], i))
reversed_name.sort(reverse = True)
for i in reversed_name[:-22]:
final_name_pg.append(i[1])
print(final_name_pg) #showing the final_name_pg list
['Tex', 'Fireproof', 'The Jazz Singer', "God's Not Dead", 'War Room', 'War Room', 'Resurrection', 'Wonder', 'Wonder', 'Footloose', 'Sense and Sensibility', 'Bridge to Terabithia', 'On Golden Pond', 'Overcomer', 'Rocky III', 'The Lunchbox', 'Forever Young', 'Cinderella', 'Little Women', 'Phenomenon', 'Urban Cowboy', 'The Last Song', "Mr. Holland's Opus", 'The Last Song', 'The Remains of the Day', 'A Walk to Remember', 'A River Runs Through It', 'Honeysuckle Rose', 'Absence of Malice', 'Staying Alive', 'The Lake House', 'Dolphin Tale', 'Taps', 'Akeelah and the Bee', 'August Rush', 'The Secret of Roan Inish', 'The Night the Lights Went Out in Georgia', 'Somewhere in Time', 'Contact', 'Tender Mercies', 'The Natural', 'Pure Country', 'The Spanish Prisoner', 'Tuck Everlasting', 'Dreamer']
Checking the number of elements in the 'final_name_pg' list.
len(final_name_pg)
45
The dataframe_RIO_pg dataframe is created.
dataframe_RIO_pg = pd.DataFrame({"Name of Movie":final_name_pg,
"Money Generated for Every $1 Spent":currency_PG})
The 'dataframe_RIO_pg' dataframe. (this dataframe is interactive)
dataframe_RIO_pg
| Name of Movie | Money Generated for Every $1 Spent |
|---|---|
| Loading... (need help?) |
Getting the RIO of all the 'PG-13' rated movies.
RIO_PG13 = []
for i in pg13_percent_profit:
i /= 10
RIO_PG13.append(i)
RIO_PG13.sort(reverse=True)
print(RIO_PG13) #showing the RIO_PG13 list
[287.6, 186.8, 165.9, 139.1, 110.2, 94.1, 80.9, 76.0, 75.2, 74.6, 73.4, 62.1, 61.2, 55.9, 53.1, 51.6, 48.8, 46.8, 46.4, 42.8, 40.2, 36.9, 33.2, 32.7, 32.4, 32.1, 30.6, 30.0, 29.7, 29.4, 28.7, 27.9, 27.9, 25.9, 25.3, 23.6, 23.2, 22.8, 21.6, 20.7, 17.4, 17.3, 16.8, 16.6, 16.3, 15.4, 14.4, 14.3, 13.7, 13.0, 12.9, 12.4, 11.7, 9.6, 8.8, 7.6, 6.5, 5.8, 4.5, 3.7, 2.4, 1.9, 1.1, 1.0, -1.7, -2.0, -2.1, -2.2, -4.2, -4.5, -4.7, -5.7, -5.9, -7.2, -7.6, -7.7]
Checking the number of elements in the 'RIO_PG13' list.
len(RIO_PG13[:-12])
64
Changing the RIO from interger to currency(dollars).
currency_PG13 = []
for i in RIO_PG13[:-12]:
currency_PG13.append("${:,.2f}".format(i))
print(currency_PG13) #showing the currency_PG13 list
['$287.60', '$186.80', '$165.90', '$139.10', '$110.20', '$94.10', '$80.90', '$76.00', '$75.20', '$74.60', '$73.40', '$62.10', '$61.20', '$55.90', '$53.10', '$51.60', '$48.80', '$46.80', '$46.40', '$42.80', '$40.20', '$36.90', '$33.20', '$32.70', '$32.40', '$32.10', '$30.60', '$30.00', '$29.70', '$29.40', '$28.70', '$27.90', '$27.90', '$25.90', '$25.30', '$23.60', '$23.20', '$22.80', '$21.60', '$20.70', '$17.40', '$17.30', '$16.80', '$16.60', '$16.30', '$15.40', '$14.40', '$14.30', '$13.70', '$13.00', '$12.90', '$12.40', '$11.70', '$9.60', '$8.80', '$7.60', '$6.50', '$5.80', '$4.50', '$3.70', '$2.40', '$1.90', '$1.10', '$1.00']
Checking the number of elements in the 'currency_PG13' list.
len(currency_PG13)
64
Getting the Name of all the 'PG-13' rated movies to create the dataframe_RIO_pg13 dataframe.
final_name_pg13 = []
reversed_name = []
for x,i in enumerate(pg13_name):
reversed_name.append((pg13_percent_profit[x], i))
reversed_name.sort(reverse = True)
for i in reversed_name[:-12]:
final_name_pg13.append(i[1])
print(final_name_pg13) #showing the final_name_pg13 list
['Lights Out', 'A Quiet Place', 'Courageous', 'Like Crazy', 'Another Earth', 'Me Before You', 'Ouija: Origin of Evil', 'The Woman in Black', 'The Help', 'Sing', 'Still Alice', 'True Grit', 'If I Stay', 'The Vow', 'Gravity', 'Everything, Everything', 'Ida', 'Dear John', 'Brooklyn', 'Gifted', 'Step Up Revolution', 'Creed', 'Arrival', 'Creed II', 'The Impossible', 'The Bye Bye Man', 'Bridge of Spies', 'The Book Thief', 'Mustang', 'One Day', 'The Lucky One', 'Before I Fall', 'Amour', 'The Post', 'Remember Me', 'Safe Haven', 'Rings', 'The Roommate', 'Mud', 'Water for Elephants', 'Project Almanac', 'The Words', 'Fences', 'The Giver', 'The Rite', 'The Perks of Being a Wallflower', 'Black or White', 'Suffragette', 'Collateral Beauty', 'The Age of Adaline', 'Contagion', 'Beastly', 'Hereafter', 'Wish Upon', 'The Longest Ride', 'The Tree of Life', 'Burlesque', 'The Best of Me', 'Anna Karenina', 'Country Strong', 'Rabbit Hole', 'Draft Day', 'The Light Between Oceans', 'Charlie St. Cloud']
Checking the number of elements in the 'final_name_pg13' list.
len(final_name_pg13)
64
Getting the Mean of RIO of all the 'PG-13' rated movies.
avg_PG13 = statistics.mean(RIO_PG13[:-12])
avg_PG13
41.44375
Getting the 25th, 50th and the 75th percentiles of the RIO of all the 'PG-13' rated movies.
np.percentile(RIO_PG13[:-12], [25,50,75])
array([14.15, 27.9 , 49.5 ])
The dataframe_RIO_pg13 dataframe is created.
dataframe_RIO_pg13 = pd.DataFrame({"Name of Movie":final_name_pg13,
"Money Generated for Every $1 Spent":currency_PG13})
The 'dataframe_RIO_pg13' dataframe. (this dataframe is interactive)
dataframe_RIO_pg13
| Name of Movie | Money Generated for Every $1 Spent |
|---|---|
| Loading... (need help?) |
Getting the RIO of all the 'NC-17' rated movies.
RIO_NC = []
for i in nc17_percent_profit:
i /= 10
RIO_NC.append(i)
RIO_NC.sort(reverse=True)
print(RIO_NC) #showing the RIO_NC list
[334.8, 279.2, 191.7, 167.6, 159.3, 155.7, 145.7, 128.9, 126.1, 126.1, 99.3, 80.0, 66.1, 38.7, 37.8, 34.7, 33.4, 33.4, 28.2, 21.4, 21.4, 21.4, 21.4, 16.1, 14.0, 13.2, 10.4, 6.9, 4.8, 3.9, 0.7, 0.2, 0.1, 0.1, -0.7, -1.6, -5.2, -5.3, -5.5, -5.5, -6.0, -6.3, -6.9, -7.8, -7.9, -8.5, -8.7, -9.0, -9.3]
Checking the number of elements in the 'RIO_NC' list.
len(RIO_NC[:-15])
34
Changing the RIO from interger to currency(dollars).
currency_NC = []
for i in RIO_NC[:-15]:
currency_NC.append("${:,.2f}".format(i))
print(currency_NC) #showing the currency_NC list
['$334.80', '$279.20', '$191.70', '$167.60', '$159.30', '$155.70', '$145.70', '$128.90', '$126.10', '$126.10', '$99.30', '$80.00', '$66.10', '$38.70', '$37.80', '$34.70', '$33.40', '$33.40', '$28.20', '$21.40', '$21.40', '$21.40', '$21.40', '$16.10', '$14.00', '$13.20', '$10.40', '$6.90', '$4.80', '$3.90', '$0.70', '$0.20', '$0.10', '$0.10']
Checking the number of elements in the 'currency_NC' list.
len(currency_NC)
34
Getting the Name of all the 'NC-17' rated movies to create the dataframe_RIO_NC dataframe.
final_name_NC = []
reversed_name = []
for x,i in enumerate(nc17_name):
reversed_name.append((nc17_percent_profit[x], i))
reversed_name.sort(reverse = True)
for i in reversed_name[:-15]:
final_name_NC.append(i[1])
print(final_name_NC) #showing the final_name_NC list
['Pink Flamingos', 'Last Tango in Paris', 'Whore 1991', 'Hell', 'Clerks', 'Blue Valentine', 'Crash', 'Tokyo Decadence', 'Kids', 'Kids', 'Crash', 'Beyond the Valley of the Dolls', 'The Evil Dead', 'Blue Is the Warmest Colour', 'Blue Is the Warmest Colour', 'Lust, Caution', 'Se, jie', 'Lust, Caution ', 'Arabian Nights', 'Shame', 'Shame', 'Shame', 'Shame', 'Happiness 1998', 'Law of Desire', 'Two Girls and a Guy', 'Bad Lieutenant', 'Wide Sargasso Sea', 'Natural Born Killers', 'Matador', 'Elles', 'The Dreamers', 'Whore', 'The Dreamers']
Checking the number of elements in the 'final_name_NC' list.
len(final_name_NC)
34
Getting the Mean of RIO of all the 'NC-17' rated movies.
avg_NC = statistics.mean(RIO_NC[:-15])
avg_NC
71.25588235294117
Getting the 25th, 50th and the 75th percentiles of the RIO of all the 'NC-17' rated movies.
np.percentile(RIO_NC[:-15], [25,50,75])
array([ 13.4, 33.4, 126.1])
The dataframe_RIO_NC dataframe is created.
dataframe_RIO_NC = pd.DataFrame({"Name of Movie":final_name_NC,
"Money Generated for Every $1 Spent":currency_NC})
The 'dataframe_RIO_NC' dataframe. (this dataframe is interactive)
dataframe_RIO_NC
| Name of Movie | Money Generated for Every $1 Spent |
|---|---|
| Loading... (need help?) |
Styling the first portion of the 'dataframe_RIO_r' dataframe 'dataframe_RIO_r1' dataframe.
dataframe_RIO_r1 = dataframe_RIO_r[:19].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #ff5500')]},
{"selector":"thead", 'props':[("background-color","white"),
("color","#ff5500")]},
{'selector':"td", "props":[("background-color","white"),('border-bottom',
'4px solid #ff5500'),
("color","#ff5500")]},
{'selector':'th.row_heading', 'props':[('background-color','white'),
('color','#ff5500')]}])
Saving the dataframe_RIO_r1 dataframe to the dataframe_RIO_r1.png file as an image to be used for the analysis later on.
dfi.export(dataframe_RIO_r1, 'dataframe_RIO_r1.png')
The 'dataframe_RIO_r1' datarame.
Styling the second portion of the 'dataframe_RIO_r' dataframe 'dataframe_RIO_r2' dataframe.
dataframe_RIO_r2 = dataframe_RIO_r[19:37].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #ff5500')]},
{"selector":"thead", 'props':[("background-color","white"),
("color","#ff5500")]},
{'selector':"td", "props":[("background-color","white"),('border-bottom',
'4px solid #ff5500'),
("color","#ff5500")]},
{'selector':'th.row_heading', 'props':[('background-color','white'),
('color','#ff5500')]}])
Saving the dataframe_RIO_r2 dataframe to the dataframe_RIO_r2.png file as an image to be used for the analysis later on.
dfi.export(dataframe_RIO_r2, 'dataframe_RIO_r2.png')
The 'dataframe_RIO_r2' datarame.
Styling the last portion of the 'dataframe_RIO_r' dataframe 'dataframe_RIO_r3' dataframe.
dataframe_RIO_r3 = dataframe_RIO_r[37:].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #ff5500')]},
{"selector":"thead", 'props':[("background-color","white"),
("color","#ff5500")]},
{'selector':"td", "props":[("background-color","white"),('border-bottom',
'4px solid #ff5500'),
("color","#ff5500")]},
{'selector':'th.row_heading', 'props':[('background-color','white'),
('color','#ff5500')]}])
Saving the dataframe_RIO_r3 dataframe to the dataframe_RIO_r3.png file as an image to be used for the analysis later on.
dfi.export(dataframe_RIO_r3, 'dataframe_RIO_r3.png')
The 'dataframe_RIO_r3' datarame.
Styling the first portion of the 'dataframe_RIO_g' dataframe 'dataframe_RIO_g1' dataframe.
dataframe_RIO_g1 = dataframe_RIO_g[:12].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid red')]},
{"selector":"thead", 'props':[("background-color","white"),
("color","red")]},
{'selector':"td", "props":[("background-color","white"),('border-bottom',
'4px solid red'),
("color","red")]},
{'selector':'th.row_heading', 'props':[('background-color','white'),
('color','red')]}])
Saving the dataframe_RIO_g1 dataframe to the dataframe_RIO_g1.png file as an image to be used for the analysis later on.
dfi.export(dataframe_RIO_g1, 'dataframe_RIO_g1.png')
The 'dataframe_RIO_g1' datarame.
Styling the second portion of the 'dataframe_RIO_g' dataframe 'dataframe_RIO_g2' dataframe.
dataframe_RIO_g2 = dataframe_RIO_g[12:].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid red')]},
{"selector":"thead", 'props':[("background-color","white"),
("color","red")]},
{'selector':"td", "props":[("background-color","white"),('border-bottom',
'4px solid red'),
("color","red")]},
{'selector':'th.row_heading', 'props':[('background-color','white'),
('color','red')]}])
Saving the dataframe_RIO_g2 dataframe to the dataframe_RIO_g2.png file as an image to be used for the analysis later on.
dfi.export(dataframe_RIO_g2, 'dataframe_RIO_g2.png')
The 'dataframe_RIO_g2' datarame.
Styling the first portion of the 'dataframe_RIO_pg' dataframe 'dataframe_RIO_pg1' dataframe.
dataframe_RIO_pg1 = dataframe_RIO_pg[:22].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #fa5f55')]},
{"selector":"thead", 'props':[("background-color","white"),
("color","#fa5f55")]},
{'selector':"td", "props":[("background-color","white"),('border-bottom',
'4px solid #fa5f55'),
("color","#fa5f55")]},
{'selector':'th.row_heading', 'props':[('background-color','white'),
('color','#fa5f55')]}])
Saving the dataframe_RIO_pg1 dataframe to the dataframe_RIO_pg1.png file as an image to be used for the analysis later on.
dfi.export(dataframe_RIO_pg1, 'dataframe_RIO_pg1.png')
The 'dataframe_RIO_pg1' datarame.
Styling the second portion of the 'dataframe_RIO_pg' dataframe 'dataframe_RIO_pg2' dataframe.
dataframe_RIO_pg2 = dataframe_RIO_pg[22:].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #fa5f55')]},
{"selector":"thead", 'props':[("background-color","white"),
("color","#fa5f55")]},
{'selector':"td", "props":[("background-color","white"),('border-bottom',
'4px solid #fa5f55'),
("color","#fa5f55")]},
{'selector':'th.row_heading', 'props':[('background-color','white'),
('color','#fa5f55')]}])
Saving the dataframe_RIO_pg2 dataframe to the dataframe_RIO_pg2.png file as an image to be used for the analysis later on.
dfi.export(dataframe_RIO_pg2, 'dataframe_RIO_pg2.png')
The 'dataframe_RIO_pg2' datarame.
Styling the first portion of the 'dataframe_RIO_pg13' dataframe 'dataframe_RIO_pg131' dataframe.
dataframe_RIO_pg131 = dataframe_RIO_pg13[:22].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #DE3163')]},
{"selector":"thead", 'props':[("background-color","white"),
("color","#DE3163")]},
{'selector':"td", "props":[("background-color","white"),('border-bottom',
'4px solid #DE3163'),
("color","#DE3163")]},
{'selector':'th.row_heading', 'props':[('background-color','white'),
('color','#DE3163')]}])
Saving the dataframe_RIO_pg131 dataframe to the dataframe_RIO_pg131.png file as an image to be used for the analysis later on.
dfi.export(dataframe_RIO_pg131, 'dataframe_RIO_pg131.png')
The 'dataframe_RIO_pg131' datarame.
Styling the second portion of the 'dataframe_RIO_pg13' dataframe 'dataframe_RIO_pg132' dataframe.
dataframe_RIO_pg132 = dataframe_RIO_pg13[22:42].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #DE3163')]},
{"selector":"thead", 'props':[("background-color","white"),
("color","#DE3163")]},
{'selector':"td", "props":[("background-color","white"),('border-bottom',
'4px solid #DE3163'),
("color","#DE3163")]},
{'selector':'th.row_heading', 'props':[('background-color','white'),
('color','#DE3163')]}])
Saving the dataframe_RIO_pg132 dataframe to the dataframe_RIO_pg132.png file as an image to be used for the analysis later on.
dfi.export(dataframe_RIO_pg132, 'dataframe_RIO_pg132.png')
The 'dataframe_RIO_pg132' datarame.
Styling the last portion of the 'dataframe_RIO_pg13' dataframe 'dataframe_RIO_pg133' dataframe.
dataframe_RIO_pg133 = dataframe_RIO_pg13[42:].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #DE3163')]},
{"selector":"thead", 'props':[("background-color","white"),
("color","#DE3163")]},
{'selector':"td", "props":[("background-color","white"),('border-bottom',
'4px solid #DE3163'),
("color","#DE3163")]},
{'selector':'th.row_heading', 'props':[('background-color','white'),
('color','#DE3163')]}])
Saving the dataframe_RIO_pg133 dataframe to the dataframe_RIO_pg133.png file as an image to be used for the analysis later on.
dfi.export(dataframe_RIO_pg133, 'dataframe_RIO_pg133.png')
The 'dataframe_RIO_pg133' datarame.
Styling the first portion of the 'dataframe_RIO_NC' dataframe 'dataframe_RIO_NC1' dataframe.
dataframe_RIO_NC1 = dataframe_RIO_NC[:17].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #581845')]},
{"selector":"thead", 'props':[("background-color","white"),
("color","#581845")]},
{'selector':"td", "props":[("background-color","white"),('border-bottom',
'4px solid #581845'),
("color","#581845")]},
{'selector':'th.row_heading', 'props':[('background-color','white'),
('color','#581845')]}])
Saving the dataframe_RIO_NC1 dataframe to the dataframe_RIO_NC1.png file as an image to be used for the analysis later on.
dfi.export(dataframe_RIO_NC1, 'dataframe_RIO_NC1.png')
The 'dataframe_RIO_NC1' datarame.
Styling the second portion of the 'dataframe_RIO_NC' dataframe 'dataframe_RIO_NC2' dataframe.
dataframe_RIO_NC2 = dataframe_RIO_NC[17:].style.set_table_styles([{'selector':' ', 'props': [('border','10px solid #581845')]},
{"selector":"thead", 'props':[("background-color","white"),
("color","#581845")]},
{'selector':"td", "props":[("background-color","white"),('border-bottom',
'4px solid #581845'),
("color","#581845")]},
{'selector':'th.row_heading', 'props':[('background-color','white'),
('color','#581845')]}])
Saving the dataframe_RIO_NC2 dataframe to the dataframe_RIO_NC2.png file as an image to be used for the analysis later on.
dfi.export(dataframe_RIO_NC2, 'dataframe_RIO_NC2.png')
The 'dataframe_RIO_NC2' datarame.
Getting the budget spent on all of the R-rated movies.
cost = []
for i in system_rating_r.Cost:
i = int(i.replace('$', '').replace(',', ''))
cost.append(i)
print(cost) #showing the cost list
[100000000, 61000000, 60000000, 55000000, 55000000, 55000000, 52500000, 40000000, 37500000, 31000000, 23000000, 22500000, 22500000, 21000000, 20000000, 20000000, 13000000, 13000000, 13000000, 12000000, 12000000, 11800000, 11000000, 10000000, 9400000, 8500000, 7000000, 5000000, 4900000, 4750000, 4000000, 3500000, 3400000, 3300000, 3000000, 2000000, 2000000, 2000000, 2000000, 2000000, 2000000, 1987650, 1500000, 1000000, 1000000, 1000000, 135000, 100000, 6000000, 8500000, 20000000, 100000, 2700000, 11500000, 9000000]
Checking the number of elements in the 'cost' list.
len(cost)
55
Putting the cost of all the R-rated movies into a dtaframe called df_cost_r.
df_cost_r = pd.DataFrame({"Cost":cost})
The 'df_cost_r' dataframe. (this dataframe is interactive)
df_cost_r
| Cost |
|---|
| Loading... (need help?) |
Getting the Arithmetic Mean of the all the expenese spent on all of the R-rated movies.
x = statistics.mean(cost)
print("Arithmetic Mean of the cost for the R-rated movies is:", x)
Arithmetic Mean of the cost for the R-rated movies is: 16455866.363636363
Getting the Median of the all the expenese spent on all of the R-rated movies.
print("Median of the cost for the R-rated movies is:", statistics.median(cost))
Median of the cost for the R-rated movies is: 9000000
Getting the Mode of the all the expenese spent on all of the R-rated movies.
print("Mode of the cost for the R-rated movies is:",statistics.mode(cost))
Mode of the cost for the R-rated movies is: 2000000
Getting the Standard Deviation of the all the expenese spent on all of the R-rated movies.
print("Standard deviation of the cost for the R-rated movies is:", np.std(cost, ddof=1))
Standard deviation of the cost for the R-rated movies is: 20757148.21084636
Getting the Variance of the all the expenese spent on all of the R-rated movies.
print("Variance of the cost for the R-rated movies is:",statistics.variance(cost))
Variance of the cost for the R-rated movies is: 430859201847042.1
Getting the Coefficient Variation of the all the expenese spent on all of the R-rated movies.
cv = lambda x: np.std(x, ddof=1) / np.mean(x) * 100
print("Coefficient of Variation of the cost for the R-rated movies is:", cv(cost))
Coefficient of Variation of the cost for the R-rated movies is: 126.13828863313347
Getting the First Quartile of the all the expenese spent on all of the R-rated movies.
# First quartile (Q1)
Q1 = np.percentile(cost, 25, interpolation = 'midpoint')
print("The Q1 of the cost for the R-rated movies is:",Q1)
The Q1 of the cost for the R-rated movies is: 2350000.0
Getting the Third Quartile of the all the expenese spent on all of the R-rated movies.
# Third quartile (Q3)
Q3 = np.percentile(cost, 75, interpolation = 'midpoint')
print("The Q3 of the cost for the R-rated movies is:",Q3)
The Q3 of the cost for the R-rated movies is: 20500000.0
Getting the Interquaritle Range of the all the expenese spent on all of the R-rated movies.
# Interquaritle range (IQR)
IQR = Q3 - Q1
print("The interquaritle range of the cost for the R-rated movies is:",IQR)
The interquaritle range of the cost for the R-rated movies is: 18150000.0
Getting the Pearson’s Coefficient of Skewness of the all the expenese spent on all of the R-rated movies.
def pearsons(mean, median, standard_deviation):
skewness = (mean-median)*3/standard_deviation
return skewness
print("Pearson’s Coefficient of Skewness of the cost for the R-rated movies is:",
pearsons( statistics.mean(cost),statistics.median(cost),np.std(cost, ddof=1)))
Pearson’s Coefficient of Skewness of the cost for the R-rated movies is: 1.0775853630616372
Getting the Chebyshevs Theroem of the all the expenese spent on all of the R-rated movies.
def chebyshevs(mean, standard_deviation, num_std, previous_p):
position_std = num_std*standard_deviation
upper_range = mean - position_std
if upper_range < 0: upper_range = 0
lower_range = position_std + mean
if num_std == 2:
print('At least 75% of the butget of the r-rated movies ranges from',upper_range,'to',lower_range)
if num_std == 3:
print('At least 13.9% of the butget of the r-rated movies ranges from',previous_p,'to',lower_range)
chebyshevs(16455866, 20757148, 2, 0)
chebyshevs(16455866, 20757148, 3, 57970162)
At least 75% of the butget of the r-rated movies ranges from 0 to 57970162 At least 13.9% of the butget of the r-rated movies ranges from 57970162 to 78727310
Getting the Kurtosis of the all the expenese spent on all of the R-rated movies.
print('Kurtosis of the budget of the r-rated movies is:',kurtosis(cost, fisher=False))
print('Excess Kurtosis of the budget of the r-rated movies is:',
(kurtosis(cost,fisher=False)-3))#leptokurtic
Kurtosis of the budget of the r-rated movies is: 6.718303925498721 Excess Kurtosis of the budget of the r-rated movies is: 3.718303925498721
Getting the Arithmetic Mean and the Trimmed Mean of the all the expenese spent on all of the R-rated movies.
print("Arithmetic Mean of the cost for the R-rated movies is:", statistics.mean(cost))
print('10% Trimmed mean of the budget of the r-rated movies is:',stats.trim_mean(cost, 0.10))
Arithmetic Mean of the cost for the R-rated movies is: 16455866.363636363 10% Trimmed mean of the budget of the r-rated movies is: 12705281.111111112
Getting the Z-score of the all the expenese spent on all of the R-rated movies.
stats.zscore(cost)
array([ 4.06193284, 2.16574487, 2.11712467, 1.87402365, 1.87402365,
1.87402365, 1.75247314, 1.14472059, 1.02317007, 0.70713875,
0.31817711, 0.29386701, 0.29386701, 0.22093671, 0.1723165 ,
0.1723165 , -0.16802493, -0.16802493, -0.16802493, -0.21664513,
-0.21664513, -0.22636917, -0.26526534, -0.31388554, -0.34305766,
-0.38681585, -0.45974615, -0.55698656, -0.56184858, -0.56914161,
-0.60560677, -0.62991687, -0.63477889, -0.63964091, -0.65422697,
-0.70284717, -0.70284717, -0.70284717, -0.70284717, -0.70284717,
-0.70284717, -0.70344763, -0.72715728, -0.75146738, -0.75146738,
-0.75146738, -0.79352386, -0.79522556, -0.50836636, -0.38681585,
0.1723165 , -0.79522556, -0.66881303, -0.24095523, -0.36250575])
Seperating all the expenese spent on all of the R-rated movies, into four categories, 'micro_bud' which is the lowest end of the expenses, 'low_bud' which is part of the lower end of the expenses, 'mid_bud' which is the middle of the expenses and 'high_bud' which is the higher end of the expenses.
micro_bud = 0
low_bud = 0
mid_bud = 0
high_bud = 0
for i in cost:
if 0 <= i <= 100000:micro_bud+=1
for i in cost:
if 100001 <= i <= 15000000:low_bud+=1
for i in cost:
if 15000001 <= i <= 50000000:mid_bud+=1
for i in cost:
if 50000001 <= i:high_bud+=1
Showing how many movies are in each category. 'micro_bud' has 2 movies, 'low_bud' has 36 movies, 'mid_bud' has 10 movies and 'high_bud' has 7 movies.
micro_bud,low_bud,mid_bud,high_bud
(2, 36, 10, 7)
Created a function called 'Bernoulli_Dist' to get the Bernoulli Distribution of each category compared to the expenses spent on the R-rated movies.
def Bernoulli_Dist(micro,low,mid,high,n):
micro_p = micro / n
low_p = low / n
mid_p = mid / n
high_p = high / n
return micro_p, low_p, mid_p, high_p
Using the 'Bernoulli_Dist' function to get how distributed each category is.
p_vals = Bernoulli_Dist(micro_bud,low_bud,mid_bud,high_bud,55)
p_vals
(0.03636363636363636, 0.6545454545454545, 0.18181818181818182, 0.12727272727272726)
Seperating the 'micro_bud' and 'low_bud' with a total of 35 movies, into three catgories. The first category is between 100,000 and 5,000,000. The second category is between 5,000,001 and 10,000,000. The third category is between 10,000,001 and 15,000,000.
group1 = 0
group2 =0
group3 = 0
for i in cost:
if 1000000 <= i <= 5000000:group1+=1
for i in cost:
if 5000001 <= i <= 10000000:group2+=1
for i in cost:
if 10000001 <= i <=15000000:group3+=1
The first category has 20 movies. The second category has 27 movies. The third category has 28 movies.
group1,group2,group3
(20, 7, 8)
Using the 'Bernoulli_Dist' function to get how distributed each category is.
p_vals1 = Bernoulli_Dist(group1,group2,group3,0,35)
p_vals1
(0.5714285714285714, 0.2, 0.22857142857142856, 0.0)
Seperating the 'micro_bud' and 'low_bud' with a total of 35 movies, into three catgories. The first category is between 100,000 and 5,000,000. The second category is between 5,000,001 and 10,000,000. The third category is between 10,000,001 and 15,000,000.
group1 = 0
group2 =0
group3 = 0
for i in cost:
if 15000001 <= i <= 20000000:group1+=1
for i in cost:
if 20000001 <= i <= 30000000:group2+=1
for i in cost:
if 30000001 <= i <=50000000:group3+=1
The first category has 3 movies. The second category has 4 movies. The third category has 3 movies.
group1,group2,group3
(3, 4, 3)
Using the 'Bernoulli_Dist' function to get how distributed each category is.
p_vals2 = Bernoulli_Dist(group1,group2,group3,0,10)
p_vals2
(0.3, 0.4, 0.3, 0.0)
Seperating the 'micro_bud' and 'low_bud' with a total of 35 movies, into three catgories. The first category is between 100,000 and 5,000,000. The second category is between 5,000,001 and 10,000,000. The third category is between 10,000,001 and 15,000,000.
group1 = 0
group2 =0
group3 = 0
for i in cost:
if 50000001 <= i <= 60000000:group1+=1
for i in cost:
if 60000001 <= i <= 70000000:group2+=1
for i in cost:
if 90000000 <= i <=100000000:group3+=1
The first category has 5 movies. The second category has 1 movie. The third category has 1 movie.
group1,group2,group3
(5, 1, 1)
Using the 'Bernoulli_Dist' function to get how distributed each category is.
p_vals3 = Bernoulli_Dist(group1,group2,group3,0,7)
p_vals3
(0.7142857142857143, 0.14285714285714285, 0.14285714285714285, 0.0)
Rounding the expenses of R-rated movies to the nearest million and storing it in a list called 'freq_demo'.
freq_demo = []
for i in cost:
freq_demo.append((round(i, -6)))
print(freq_demo) #showing the freq_demo list
[100000000, 61000000, 60000000, 55000000, 55000000, 55000000, 52000000, 40000000, 38000000, 31000000, 23000000, 22000000, 22000000, 21000000, 20000000, 20000000, 13000000, 13000000, 13000000, 12000000, 12000000, 12000000, 11000000, 10000000, 9000000, 8000000, 7000000, 5000000, 5000000, 5000000, 4000000, 4000000, 3000000, 3000000, 3000000, 2000000, 2000000, 2000000, 2000000, 2000000, 2000000, 2000000, 2000000, 1000000, 1000000, 1000000, 0, 0, 6000000, 8000000, 20000000, 0, 3000000, 12000000, 9000000]
Checking the number of elements in the 'freq_demo' list.
len(freq_demo)
55
Getting the index of each element in the 'freq_demo' list.
index_freq = []
for i,x in enumerate(freq_demo):index_freq.append((i,x))
print(index_freq) #showing the index_freq list
[(0, 100000000), (1, 61000000), (2, 60000000), (3, 55000000), (4, 55000000), (5, 55000000), (6, 52000000), (7, 40000000), (8, 38000000), (9, 31000000), (10, 23000000), (11, 22000000), (12, 22000000), (13, 21000000), (14, 20000000), (15, 20000000), (16, 13000000), (17, 13000000), (18, 13000000), (19, 12000000), (20, 12000000), (21, 12000000), (22, 11000000), (23, 10000000), (24, 9000000), (25, 8000000), (26, 7000000), (27, 5000000), (28, 5000000), (29, 5000000), (30, 4000000), (31, 4000000), (32, 3000000), (33, 3000000), (34, 3000000), (35, 2000000), (36, 2000000), (37, 2000000), (38, 2000000), (39, 2000000), (40, 2000000), (41, 2000000), (42, 2000000), (43, 1000000), (44, 1000000), (45, 1000000), (46, 0), (47, 0), (48, 6000000), (49, 8000000), (50, 20000000), (51, 0), (52, 3000000), (53, 12000000), (54, 9000000)]
Checking the number of elements in the 'index_freq' list.
len(index_freq)
55
Replacing some elements in the 'freq_demo' list with another value.
freq_demo[48] = 10000000
freq_demo[6] = 55000000
freq_demo[22] = 10000000
freq_demo[49] = 10000000
freq_demo[8] = 40000000
freq_demo[10] = 20000000
freq_demo[11] = 20000000
freq_demo[12] = 20000000
freq_demo[13] = 20000000
freq_demo[24] = 10000000
freq_demo[25] = 10000000
freq_demo[26] = 10000000
freq_demo[54] = 10000000
freq_demo[1] = 60000000
freq_demo[2] = 60000000
freq_demo[-4] = 100000
freq_demo[-8] = 100000
freq_demo[-9] = 100000
Getting the Frequency of the Repeated Values of all the expenese spent of the R-rated Drama movies. Which will be stored in a dictionary called 'freq_demo1'.
freq_demo1 = Counter((freq_demo))
print(freq_demo1)#showing the freq_demo1 dicttionary
Counter({10000000: 8, 2000000: 8, 20000000: 7, 55000000: 4, 12000000: 4, 3000000: 4, 13000000: 3, 5000000: 3, 1000000: 3, 100000: 3, 60000000: 2, 40000000: 2, 4000000: 2, 100000000: 1, 31000000: 1})
Sorting the 'freq_demo1' dictionary in accending order.
freq_one = sorted(freq_demo1.items(), key=lambda i: i[0])
print(freq_one)#showing the freq_one list
[(100000, 3), (1000000, 3), (2000000, 8), (3000000, 4), (4000000, 2), (5000000, 3), (10000000, 8), (12000000, 4), (13000000, 3), (20000000, 7), (31000000, 1), (40000000, 2), (55000000, 4), (60000000, 2), (100000000, 1)]
Creating a list called 'cost_freq' with the cost of the R-rated Dram movies and creating another list called 'cost_freq_amount' with the frequency of the values in 'cost_frq' list.
cost_freq = []
cost_freq_amount = []
for i in freq_one:
cost_freq_amount.append(i[1])
cost_freq.append("${:,.0f}".format(i[0]))
The 'cost_freq' list.
print(cost_freq)#showing the cost_freq list
['$100,000', '$1,000,000', '$2,000,000', '$3,000,000', '$4,000,000', '$5,000,000', '$10,000,000', '$12,000,000', '$13,000,000', '$20,000,000', '$31,000,000', '$40,000,000', '$55,000,000', '$60,000,000', '$100,000,000']
Checking the number of elements in the 'cost_freq' list.
len(cost_freq)
15
The 'cost_freq_amount' list.
print(cost_freq_amount)#showing the cost_freq_amount list
[3, 3, 8, 4, 2, 3, 8, 4, 3, 7, 1, 2, 4, 2, 1]
Checking the number of elements in the 'cost_freq_amount' list.
len(cost_freq_amount)
15
Creating a Frequency Distribution Table called 'freq_dis', of all the expenese spent on all of the R-rated movies.
freq_dis = pd.DataFrame({"Amount of Budget (x)":cost_freq,
"Frequency (f)":cost_freq_amount})
The 'freq_dis' table. (this table is interactive)
freq_dis
| Amount of Budget (x) | Frequency (f) |
|---|---|
| Loading... (need help?) |
Getting the Upper Values and Lower Values of all the expenese spent on all of the R-rated movies, for the Cumulative Frequency Distribution Table.
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
a =list(chunks(range(90001, 100000000), 10000000))
a
[range(90001, 10090001), range(10090001, 20090001), range(20090001, 30090001), range(30090001, 40090001), range(40090001, 50090001), range(50090001, 60090001), range(60090001, 70090001), range(70090001, 80090001), range(80090001, 90090001), range(90090001, 100000000)]
Finalizing the Lower Values for the Cumulative Frequency Distribution Table.
lower_val = ['$90,000','$10,080,001', '$20,080,002', '$30,080,003', '$40,080,004',
'$50,080,005', '$60,080,006', '$70,080,007', '$80,080,008', '$90,080,009' ]
print(lower_val)#showing the lower_val list
['$90,000', '$10,080,001', '$20,080,002', '$30,080,003', '$40,080,004', '$50,080,005', '$60,080,006', '$70,080,007', '$80,080,008', '$90,080,009']
Checking the number of elements in the 'lower_val' list.
len(lower_val)
10
Finalizing the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.
upper_val = ['$10,080,000','$20,080,001','$30,080,002','$40,080,003','$50,080,004',
'$60,080,005','$70,080,006','$80,080,007','$90,080,008', '$100,080,009']
print(upper_val)#showing the upper_val list
['$10,080,000', '$20,080,001', '$30,080,002', '$40,080,003', '$50,080,004', '$60,080,005', '$70,080,006', '$80,080,007', '$90,080,008', '$100,080,009']
Checking the number of elements in the 'upper_val' list.
len(upper_val)
10
Getting the Frequency Amount of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
count10 = 0
for i in cost:
if 90000 <= i <= 10080000:
count1+=1
if 10080001 <= i <= 20080001:
count2+=1
if 20080002 <= i <= 30080002:
count3+=1
if 30080003 <= i <= 40080003:
count4+=1
if 40080004 <= i <= 50080004:
count5+=1
if 50080005 <= i <= 60080005:
count6+=1
if 60080006 <= i <= 70080006:
count7+=1
if 70080007 <= i <= 80080007:
count8+=1
if 80080008 <= i <= 90080008:
count9+=1
if 90080009 <= i <= 100080009:
count10+=1
freq_amount = [count1,count2,count3,count4,count5,count6,count7,count8,count9,count10]
print(freq_amount)#showing the freq_amount list
[30, 11, 4, 3, 0, 5, 1, 0, 0, 1]
Checking the number of elements in the 'freq_amount' list.
len(freq_amount)
10
Getting the Frequency Percentage of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.
freq_amount_percent_demo = [count1/55*100,count2/55*100,count3/55*100,count4/55*100,
count5/55*100,count6/55*100,count7/55*100,count8/55*100,
count9/55*100,count10/55*100]
freq_amount_percent_demo1 = [55,20,7,5,0,9,2,0,0,2]
print(freq_amount_percent_demo1)#showing the freq_amount_percent_demo1 list
[55, 20, 7, 5, 0, 9, 2, 0, 0, 2]
Checking the number of elements in the 'freq_amount_percent_demo1' list.
len(freq_amount_percent_demo1)
10
Turning the integer in the freq_amount_percent_demo1 list into a string with the percentage symbol.
freq_amount_percent = []
for i in freq_amount_percent_demo1:
freq_amount_percent.append("{:}%".format(i))
print(freq_amount_percent)#showing the freq_amount_percent list
['55%', '20%', '7%', '5%', '0%', '9%', '2%', '0%', '0%', '2%']
Checking the number of elements in the 'freq_amount_percent' list.
len(freq_amount_percent)
10
The Cumulative Function to get the cumulative sum of a list.
def Cumulative(lists):
cu_list = []
length = len(lists)
cu_list = [sum(lists[0:x:1]) for x in range(0, length+1)]
return cu_list[1:]
Getting the Cumulative Frequency Amount of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.
freq_cumulative_amount = Cumulative(freq_amount)
print(freq_cumulative_amount)#showing the freq_cumulative_amount list
[30, 41, 45, 48, 48, 53, 54, 54, 54, 55]
Checking the number of elements in the 'freq_cumulative_amount' list.
len(freq_cumulative_amount)
10
Getting the Cumulative Frequency Percentage of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.
freq_cumulative_percent_demo = Cumulative(freq_amount_percent_demo1)
print(freq_cumulative_percent_demo)#showing the freq_cumulative_percent_demo list
[55, 75, 82, 87, 87, 96, 98, 98, 98, 100]
Checking the number of elements in the 'freq_cumulative_percent_demo' list.
len(freq_cumulative_percent_demo)
10
Turning the integer in the freq_cumulative_percent_demo list into a string with the percentage symbol.
freq_cumulative_percent = []
for i in freq_cumulative_percent_demo:
freq_cumulative_percent.append("{:}%".format(i))
print(freq_cumulative_percent)#showing the freq_cumulative_percent list
['55%', '75%', '82%', '87%', '87%', '96%', '98%', '98%', '98%', '100%']
Checking the number of elements in the 'freq_cumulative_percent' list.
len(freq_cumulative_percent)
10
Creating the Cumulative Frequency Distribution Table of all the expenses spent of all the R-rated movies, uding the neccessary virables.
freq_cum_dis = pd.DataFrame({"Lower\nValue":lower_val,
"Upper\nValue":upper_val,
"Frequency (f)":freq_amount,
"Percentage (%)":freq_amount_percent,
"Cumulative\nFrequency":freq_cumulative_amount,
"Cumulative\nPercentage":freq_cumulative_percent})
The 'freq_cum_dis' table. (this table is interactive)
freq_cum_dis
| Lower Value | Upper Value | Frequency (f) | Percentage (%) | Cumulative Frequency | Cumulative Percentage |
|---|---|---|---|---|---|
| Loading... (need help?) |
Getting the Frequency Amount of the values inbetween the Intervals for the Cumulative Frequency Relative Distribution Table.
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
count10 = 0
for i in cost:
if i < 10000000:
count1+=1
if 10000000 <= i < 20000000:
count2+=1
if 20000000 <= i < 30000000:
count3+=1
if 30000000 <= i < 40000000:
count4+=1
if 40000000 <= i < 50000000:
count5+=1
if 50000000 <= i < 60000000:
count6+=1
if 60000000 <= i < 70000000:
count7+=1
if 70000000 <= i < 80000000:
count8+=1
if 80000000 <= i < 90000000:
count9+=1
if 90000000 <= i <= 100000000:
count10+=1
freq_amount = [count1,count2,count3,count4,count5,count6,count7,count8,count9,count10]
print(freq_amount)#showing the freq_amount list
[29, 9, 7, 2, 1, 4, 2, 0, 0, 1]
Checking the number of elements in the 'freq_amount' list.
len(freq_amount)
10
Getting the Frequency Percentage of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.
cum_rel_freq_demo = []
for i in freq_amount:cum_rel_freq_demo.append(i/55*100)
cum_rel_freq_demo1 = [53,16,13,3,2,7,4,0,0,2]
print(cum_rel_freq_demo1)#showing the cum_rel_freq_demo1 list
[53, 16, 13, 3, 2, 7, 4, 0, 0, 2]
Checking the number of elements in the 'cum_rel_freq_demo1' list.
len(cum_rel_freq_demo1)
10
Getting the Cumulative Relative Frequency Percentage of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.
cum_rel_freq_demo2 = Cumulative(cum_rel_freq_demo1)
print(cum_rel_freq_demo2)#showing the cum_rel_freq_demo2 list
[53, 69, 82, 85, 87, 94, 98, 98, 98, 100]
Checking the number of elements in the 'cum_rel_freq_demo2' list.
len(cum_rel_freq_demo2)
10
Turning the integer in the cum_rel_freq_demo2 list into a string with the percentage symbol.
cum_rel_freq_percent = []
for i in cum_rel_freq_demo2:
cum_rel_freq_percent.append("{:}%".format(i))
print(cum_rel_freq_percent)#showing the cum_rel_freq_percent list
['53%', '69%', '82%', '85%', '87%', '94%', '98%', '98%', '98%', '100%']
Checking the number of elements in the 'cum_rel_freq_percent' list.
len(cum_rel_freq_percent)
10
Getting the Cumulative Frequency of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.
freq_cumulative_amount = Cumulative(freq_amount)
print(freq_cumulative_amount)#showing the freq_cumulative_amount list
[29, 38, 45, 47, 48, 52, 54, 54, 54, 55]
Checking the number of elements in the 'freq_cumulative_amount' list.
len(freq_cumulative_amount)
10
Finalizing the Intervals for the Cumulative Relative Frequency Distribution Table.
intervals_cum = [ '< $10 Million','10 to < $20 Million','20 to < $30 Million',
'30 to < $40 Million',
'40 to < $50 Miilion', '50 to < $60 Miilion', '60 to < $70 Miilion',
'70 to < $80 Miilion',
'80 to < $90 Miilion','>= $100 Miilion']
print(intervals_cum)#showing the intervals_cum list
['< $10 Million', '10 to < $20 Million', '20 to < $30 Million', '30 to < $40 Million', '40 to < $50 Miilion', '50 to < $60 Miilion', '60 to < $70 Miilion', '70 to < $80 Miilion', '80 to < $90 Miilion', '>= $100 Miilion']
Checking the number of elements in the 'intervals_cum' list.
len(intervals_cum)
10
Creating the Cumulative Relative Frequency Distribution Table of all the expenses spent of all the R-rated movies, uding the neccessary virables.
cum_rel_freq = pd.DataFrame({"Amount of Budget":intervals_cum,
"Frequency (f)":freq_amount,
"Cumulative Frequency":freq_cumulative_amount,
"Cumelative Relative Frequency Percentage":cum_rel_freq_percent,
})
The 'cum_rel_freq' table. (this table is interactive)
cum_rel_freq
| Amount of Budget | Frequency (f) | Cumulative Frequency | Cumelative Relative Frequency Percentage |
|---|---|---|---|
| Loading... (need help?) |
Visualizing The Normal Distribution of the all the expenese spent on all of the R-rated movies.
means = '16,455,866'
std = '20,757,148'
def make_gauss(N, sig, mu):
return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))
def main():
ax = plt.figure().add_subplot(1,1,1)
x = np.arange(-70, 70)
s = [21]
m = [16]
c = ['#ff5500']
for sig, mu, color in zip(s, m, c):
gauss = make_gauss(1, sig, mu)(x)
ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
plt.xlim(-70, 70)
plt.ylim(0, .2)
plt.legend(fontsize=11)
plt.title('Variability of Cost of R-rated Movies, Normal\n Distribution, Mean =16.4 million, StDev=21 million',fontsize=14)
plt.xlabel("Cost of R-rated Movies",fontsize=14)
plt.ylabel("Density",fontsize=14)
plt.grid(False)
plt.savefig('variability_cost_r',bbox_inches='tight',facecolor='white', transparent=False)
plt.show()
if __name__ == '__main__':
main()
Visualizing The Variance of the all the expenese spent on all of the R-rated movies.
#plt.ylim(-1,2.1) # Setting y limits so the axis are consistent
#plt.figure(figsize=(8,5))
import matplotlib.patches as mpatches
plt.ylabel('Cost of R-rated Movies',fontsize=14)
plt.xlabel('Ranking of Values',fontsize=14)
plt.title("The Variance of all the Budegt\n of all the R-rated movies in the Drama genre",fontsize=14) # Setting the title
plt.scatter(x=df_cost_r.index, y=df_cost_r['Cost'], s=15, color='#ff5500'); # Plotting the scatter
plt.hlines(y=df_cost_r['Cost'].mean(), xmin=0, xmax=55, color='blue') # Mean line
plt.grid(False)
plt.annotate('CL - center line (Arithmetic Mean)',
xy=(176, 102),
xycoords='figure pixels',
horizontalalignment='left',
verticalalignment='top',
fontsize=11,)
plt.annotate('Data pionts',
xy=(105, 180),
color ='black',
fontsize=11,
xycoords='figure pixels',
horizontalalignment='left',
verticalalignment='top',)
plt.savefig('variance_cost_r',bbox_inches='tight',facecolor='white', transparent=False)
plt.show()# Telling matplotlib to show the chart
Visualizing The Variance using Two Standard Deviation of the all the expenese spent on all of the R-rated movies.
#plt.ylim(-1,2.1) # Setting y limits so the axis are consistent
#plt.figure(figsize=(8,5))
plt.ylabel('Cost of R-rated Movies',fontsize=14)
plt.xlabel('Position of Values',fontsize=14)
plt.title("The Variance of all the Budegt\n of all the R-rated movies in the Drama genre",fontsize=14) # Setting the title
plt.scatter(x=df_cost_r.index, y=df_cost_r['Cost'], s=15, color='#ff5500'); # Plotting the scatter
plt.hlines(y=df_cost_r['Cost'].mean(), xmin=0, xmax=55, color='blue') # Mean line
for std_int in [-3, -2, -1, 1, 2, 3]: # Going through different stds from the mean
standard_deviation = df_cost_r['Cost'].mean() + df_cost_r['Cost'].std()*std_int
if std_int in [1,2,-1,-2]:
plt.hlines(y=standard_deviation,
xmin=0,
xmax=55,
linestyles='dashed',
colors='green'); # 1 std above
if std_int ==-3:
plt.hlines(y=standard_deviation,
xmin=0,
xmax=55,
colors='red',); # 1 std above
if std_int == +3:
plt.hlines(y=standard_deviation,
xmin=0,
xmax=55,
colors='red'); # 1 std above
# Giving labels to the lines we just drew
#plt.text(y=standard_deviation + 2, x=-10, s=std_int, ha='center')
plt.grid(False)
plt.annotate('UCL - upper control limit',
xy=(84, 238),
xycoords='figure pixels',
horizontalalignment='left',
verticalalignment='top',
fontsize=11,)
plt.annotate('LCL - lower control limit',
xy=(84, 70),
xycoords='figure pixels',
horizontalalignment='left',
verticalalignment='top',
fontsize=11,)
plt.annotate('+3 SD',
xy=(355, 240),
color ='purple',
xycoords='figure pixels',
horizontalalignment='left',
verticalalignment='top',
fontsize=11,)
plt.annotate('+2 SD',
xy=(355, 210),
color ='purple',
xycoords='figure pixels',
horizontalalignment='left',
verticalalignment='top',
fontsize=11,)
plt.annotate('+1 SD',
xy=(355, 180),
color ='purple',
xycoords='figure pixels',
horizontalalignment='left',
verticalalignment='top',
fontsize=11,)
plt.annotate('CL - center line (Arithmetic Mean)',
xy=(176, 152),
xycoords='figure pixels',
horizontalalignment='left',
verticalalignment='top',
fontsize=11,)
plt.annotate('-3 SD',
xy=(355, 70),
color ='purple',
xycoords='figure pixels',
horizontalalignment='left',
verticalalignment='top',
fontsize=11,)
plt.annotate('-2 SD',
xy=(355, 98),
color ='purple',
xycoords='figure pixels',
horizontalalignment='left',
verticalalignment='top',
fontsize=11,)
plt.annotate('-1 SD',
xy=(355, 125),
color ='purple',
xycoords='figure pixels',
horizontalalignment='left',
verticalalignment='top',
fontsize=11,)
plt.savefig('variance_std_cost_r',bbox_inches='tight',facecolor='white', transparent=False)
Visualizing The Pearson’s Coefficient of Skewness of the all the expenese spent on all of the R-rated movies.
import matplotlib.pyplot as plt
# An "interface" to matplotlib.axes.Axes.hist() method
n, bins, patches = plt.hist(x=cost, bins='auto', color='#ff5500',
alpha=0.7, rwidth=0.85)
plt.grid(False)
plt.grid(axis='y', alpha=0.75)
plt.xlabel('Cost of R-rated Movies',fontsize=14)
plt.ylabel('Frequency',fontsize=14)
plt.title('The Pearson’s Coefficient of Skewness for the Budget\n of all R-rated movies is 1.07 (n=55)',fontsize=14)
plt.savefig('skew_cost_r.png', bbox_inches='tight',facecolor='white', transparent=False)
Visualizing The Comparison of Mode, Median and Mean of the all the expenese spent on all of the R-rated movies.
# An "interface" to matplotlib.axes.Axes.hist() method
median_cost = statistics.median(cost)
mean_cost = 16455866
mode_cost = statistics.mode(cost)
n, bins, patches = plt.hist(x=cost, bins='auto', color='#ff5500',
alpha=0.2, rwidth=0.85)
plt.grid(axis='y', alpha=0.75)
names = ["median", "mean", "mode"]
colors = ['green', 'red', 'blue']
measurements = [median_cost, mean_cost, mode_cost]
for measurement, name, color in zip(measurements, names, colors):
plt.axvline(x=measurement, linestyle='--', linewidth=2.5, label='{0} at {1}'.format(name, measurement), c=color)
plt.legend(fontsize=10);
plt.xlabel('Cost of R-rated Movies',fontsize=14)
plt.ylabel('Frequency',fontsize=14)
plt.title('Comparison of Mode, Median and Mean in the\n Distribution of the Cost of all the R-rated Drama movies',fontsize=14)
plt.savefig('skewness2_cost_r', bbox_inches='tight')
Visualizing The Chebyshevs Theorem of the all the expenese spent on all of the R-rated movies.
means = '16,455,866'
std = '20,757,148'
means1 = 16
std1 = 20
def make_gauss(N, sig, mu):
return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))
def main():
ax = plt.figure().add_subplot(1,1,1)
x = np.arange(-90, 90)
s = [21]
m = [16]
c = ['#ff5500']
for sig, mu, color in zip(s, m, c):
gauss = make_gauss(1, sig, mu)(x)
ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
x = np.linspace(means1 - std1*2, means1 + std1*2)
y = norm.pdf(x, means1, std1)
ax.fill_between(x, y, alpha=0.5, color='#ff5500')
ax.annotate('at least 75%\n at least 41 obs', xy=(50,0.0075), xytext=(50,0.0125),
arrowprops={'arrowstyle': '-|>'}, va='center', color='black',fontsize=11)
plt.xlim(-100, 100)
plt.ylim(0, .02)
plt.legend(fontsize=10)
plt.title('Chebyshevs Theorem on the Budget \nof the R-rated Movies in the Drama Genre (n=55)',fontsize=14)
plt.xlabel("Cost of R-rated Movies",fontsize=14)
plt.ylabel("Density", fontsize=14)
plt.savefig('cheb_cost_r',bbox_inches='tight')
plt.show()
if __name__ == '__main__':
main()
Visualizing The Chebyshevs Theorem of the all the expenese spent on all of the R-rated movies.
means = '16,455,866'
std = '20,757,148'
means1 = 16
std1 = 20
def make_gauss(N, sig, mu):
return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))
def main():
ax = plt.figure().add_subplot(1,1,1)
x = np.arange(-90, 90)
s = [21]
m = [16]
c = ['#ff5500']
for sig, mu, color in zip(s, m, c):
gauss = make_gauss(1, sig, mu)(x)
ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
x = np.linspace(57, 78)
y = norm.pdf(x, means1, std1)
ax.fill_between(x, y, alpha=0.5, color='#ff5500')
x1 = np.linspace(-26, -40)
y = norm.pdf(x, means1, std1)
ax.fill_between(x1, y, alpha=0.5, color='#ff5500')
ax.annotate('at least 13.9%\n at leat 8 obs', xy=(70,0.0025), xytext=(50,0.0075),
arrowprops={'arrowstyle': '-|>'}, va='center', color='black',fontsize=11)
plt.xlim(-100, 100)
plt.ylim(0, .0199)
plt.legend(fontsize=10)
plt.title('Chebyshevs Theorem on the Budget \nof the R-rated Movies in the Drama Genre (n=55)',fontsize=14)
plt.xlabel("Cost of R-rated Movies",fontsize=14)
plt.ylabel("Density",fontsize=14)
plt.savefig('cheb2_cost_r',bbox_inches='tight')
plt.show()
if __name__ == '__main__':
main()
Visualizing The KDE and Jittered plot of the all the expenese spent on all of the R-rated movies.
import seaborn as sns
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.stripplot(data=df_cost_r, color='#ff5500');
sns.violinplot( data=df_cost_r,inner=None,color='0.8').set(title='KDE and Jittered strip plot\n on the budget of the r-rated movies')
plt.savefig('violin_cost_r')
plt.show()
Visualizing The KDE and Swarm plot of the all the expenese spent on all of the R-rated movies.
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.swarmplot(data=df_cost_r, color='#ff5500');
sns.violinplot( data=df_cost_r, color='0.8', inner=None, aplha=.2).set(title='KDE and swarm plot\n on the budget of the r-rated movies')
#sns.despine()
plt.savefig('violin2_cost_r')
plt.show()
C:\Users\rutho\AppData\Local\Temp/ipykernel_8212/229341393.py:8: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. plt.show()
Visualizing The KDE and Rug plot of the all the expenese spent on all of the R-rated movies.
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.stripplot(data=df_cost_r, color='#ff5500', jitter=False)
sns.violinplot(data=df_cost_r, split=True,inner=None,
scale="count", color='0.8', alpha=.1).set(title='KDE and rug plot\n on the budget of the r-rated movies')
#sns.despine()
plt.savefig('violin3_cost_r')
plt.show()
C:\Users\rutho\AppData\Local\Temp/ipykernel_8212/3008613242.py:9: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. plt.show()
Visualizing The KDE and Box plot of the all the expenese spent on all of the R-rated movies.
sns.set(font_scale=.85)
plt.gcf().set_size_inches(4.2, 4)
sns.set_style("whitegrid")
ax = sns.violinplot( data=df_cost_r,color='#ff5500',fill=True,width=0.6,scale="width", inner=None)
sns.boxplot( data=df_cost_r, color='#ff5500', width=0.3, ax=ax).set(title='KDE and Box plot\n on the budget of the r-rated movies')
for violin, alpha in zip(ax.collections[::2], [0.3]):violin.set_alpha(alpha)
plt.savefig('violin4_cost_r')
Visualizing The Kernel Density Estimation of the all the expenese spent on all of the R-rated movies.
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.displot(df_cost_r, x="Cost",color='#ff5500', kind="kde",
fill=True).set(title='KDE on the Budget of the R-rated Drama Movies')
plt.xlim(0, None)
plt.savefig('skewness3_cost_r')
plt.show()
C:\Users\rutho\AppData\Local\Temp/ipykernel_8212/292974908.py:8: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. plt.show()
<Figure size 417.6x432 with 0 Axes>
The Bernoulli Distribution of the Budgets of R-rated Drama Movies that are Micro-Budgets.
import matplotlib.pyplot as plt
from scipy.stats import bernoulli
#
# Instance of Bernoulli distribution with parameter p = 0.7
#
bd = bernoulli(p_vals[0])#4%
#
# Outcome of experiment can take value as 0, 1
#
X = [0, 1]
#
# Create a bar plot; Note the usage of "pmf" function
# to determine the probability of different values of
# random variable
#
plt.figure(figsize=(7,7))
plt.rcParams['axes.facecolor'] = '#FFFAF0'
plt.xlim(-1, 2)
plt.bar(X, bd.pmf(X), color='#ff5500')
plt.title('Bernoulli Distribution (p=0.036)', fontsize='15')
plt.xlabel('Values of Random Variable X (0, 1)', fontsize='15')
plt.ylabel('Probability', fontsize='15')
plt.rcParams["axes.edgecolor"] = "black"
plt.rc('grid', linestyle="-", color='grey',alpha=0.2)
plt.grid(True)
plt.savefig('Bernoulli_dist_cost_r',bbox_inches='tight')
plt.show()
The Bernoulli Distribution of the Budgets of R-rated Drama Movies that are Low-Budgets.
import matplotlib.pyplot as plt
from scipy.stats import bernoulli
#
# Instance of Bernoulli distribution with parameter p = 0.7
#
bd = bernoulli(p_vals[1])#65%
#
# Outcome of experiment can take value as 0, 1
#
X = [0, 1]
#
# Create a bar plot; Note the usage of "pmf" function
# to determine the probability of different values of
# random variable
#
plt.figure(figsize=(7,7))
plt.rcParams['axes.facecolor'] = '#FFFAF0'
plt.xlim(-1, 2)
plt.bar(X, bd.pmf(X), color='#ff5500')
plt.title('Bernoulli Distribution (p=0.65)', fontsize='15')
plt.xlabel('Values of Random Variable X (0, 1)', fontsize='15')
plt.ylabel('Probability', fontsize='15')
plt.rcParams["axes.edgecolor"] = "black"
plt.rc('grid', linestyle="-", color='grey',alpha=0.2)
plt.grid(True)
plt.savefig('Bernoulli1_dist_cost_r',bbox_inches='tight')
plt.show()
The Bernoulli Distribution of the Budgets of R-rated Drama Movies that are Mid-Budgets.
import matplotlib.pyplot as plt
from scipy.stats import bernoulli
#
# Instance of Bernoulli distribution with parameter p = 0.7
#
bd = bernoulli(p_vals[2])#18%
#
# Outcome of experiment can take value as 0, 1
#
X = [0, 1]
#
# Create a bar plot; Note the usage of "pmf" function
# to determine the probability of different values of
# random variable
#
plt.figure(figsize=(7,7))
plt.rcParams['axes.facecolor'] = '#FFFAF0'
plt.xlim(-1, 2)
plt.bar(X, bd.pmf(X), color='#ff5500')
plt.title('Bernoulli Distribution (p=0.18)', fontsize='15')
plt.xlabel('Values of Random Variable X (0, 1)', fontsize='15')
plt.ylabel('Probability', fontsize='15')
plt.rcParams["axes.edgecolor"] = "black"
plt.rc('grid', linestyle="-", color='grey',alpha=0.2)
plt.grid(True)
plt.savefig('Bernoulli2_dist_cost_r',bbox_inches='tight')
plt.show()
The Bernoulli Distribution of the Budgets of R-rated Drama Movies that are High-Budgets.
import matplotlib.pyplot as plt
from scipy.stats import bernoulli
#
# Instance of Bernoulli distribution with parameter p = 0.7
#
bd = bernoulli(p_vals[3])#13%
#
# Outcome of experiment can take value as 0, 1
#
X = [0, 1]
#
# Create a bar plot; Note the usage of "pmf" function
# to determine the probability of different values of
# random variable
#
plt.figure(figsize=(7,7))
plt.rcParams['axes.facecolor'] = '#FFFAF0'
plt.xlim(-1, 2)
plt.bar(X, bd.pmf(X), color='#ff5500')
plt.title('Bernoulli Distribution (p=0.13)', fontsize='15')
plt.xlabel('Values of Random Variable X (0, 1)', fontsize='15')
plt.ylabel('Probability', fontsize='15')
plt.rcParams["axes.edgecolor"] = "black"
plt.rc('grid', linestyle="-", color='grey',alpha=0.2)
plt.grid(True)
plt.savefig('Bernoulli3_dist_cost_r',bbox_inches='tight')
plt.show()
Q-Q plot or Quantile plot for checking the distribution of the all the expenese spent on all of the R-rated movies.
from scipy import stats
import matplotlib.style as style
plt.figure(figsize=(5,3))
stats.probplot(cost,plot=plt)
#ax = fig.subplots()
#ax = fig.add_subplot()
#fig, ax = plt.subplots()
#ax.get_lines()[0].set_markerfacecolor('C0')
plt.title("Distribution Plot of the Budgets of R-rated Drama Movies",fontsize=10)
plt.xlabel('Theoretical Quantiles',fontsize=10)
plt.ylabel('Ordered Values',fontsize=10)
plt.savefig('Probab_plot_r3',bbox_inches='tight',facecolor='white', transparent=False)
plt.show()
This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of the Bernoulli Disbribution on each Budgey category (ranging from $100,00 to $50 Million) , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript below. (the graph below is interactive, you can hover over the pie chart)
%%html
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
<figure class="highcharts-figure">
<div id="containerr"></div>
</figure>
This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Bernoulli Distribution of the Budget' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML .
%%js
Highcharts.chart('containerr', {
chart: {
width:950,
height:500,
styledMode: false,
plotBackgroundColor: null,
plotBorderWidth: null,
plotShadow: false,
type: 'pie'
},
title: {
text: 'The Bernoulli Distribution on the Budgets of R-rated Drama Movies'
},
tooltip: {
pointFormat: '{series.name}: <b>{point.percentage:.1f}%</b>'
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
accessibility: {
point: {
valueSuffix: '%'
}
},
plotOptions: {
pie: {
allowPointSelect: true,
cursor: 'pointer',
dataLabels: {
enabled: true,
format: '<b>{point.name}</b>: {point.percentage:.1f} %'
},
showInLegend: true
}
},
series: [{
name: 'System Rating',
colorByPoint: true,
colors: ['#ba450b','#ff5500','#e8946a','#f0c3ad'],
data: [{
name:'Micro Budget: <br>$0 to $100,000',
y: 3
}, {
name: 'Low Budget: <br>$100,000 to $15 Million',
y: 35
}, {
name: 'Mid Budget: <br>$15 Million to $50 Million',
y: 10,
sliced: true,
selected: true
}, {
name: 'High Budget: <br>$50 Million+',
y: 7
}]
}]
});
This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of the Bernoulli Disbribution on each Budget category (and comparing each sub-category to the entire dataframe and to the main category) , within the 'Drama_DataFrame' dataframe, using a 'Column Chart'. This will be done using Javascript below. (the graph below is interactive, you can hover over the column chart)
%%html
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
<table><tr><th></th><th></th><th></th><th></th></tr><tr><th></th><th></th><th></th><th></th></tr></th><th></th></tr>
<tr>
<td><div id="container"></div><td>
<td><div id="container1"></div><td>
<td><div id="container2"></div><td>
</tr>
</table>
This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Bernoulli Distribution of the Budget' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column Chart'. This will be done using Javascript and HTML .
%%js
Highcharts.chart('container', {
chart: {
type: 'column',
},
title: {
text: 'The Bernoulli Distribution on <br>the Sub-groups in the Low-Budget Category'
},
subtitle: {
text: 'R-rated Drama Movies'
},
xAxis: {
type: 'category',
labels: {
style: {
fontSize: '13px',
fontFamily: 'Verdana, sans-serif'
}
}
},
yAxis: {
min: 0,
title: {
text: 'Probability (%)'
}
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
plotOptions: {
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.0f}%'
}
}
},
tooltip: {
pointFormat: 'Percentage: <b>{point.y:.0f} %</b>'
},
series: [{
color: '#ff5500',
name: 'Compared to the Sub-groups within the Low-Budget category',
data: [
['$1 Million to $5 Million', 57],
['$5 Million to $10 Million', 20],
['$10 Million to $15 Million', 23]
]
}, {
color: '#ba450b',
name: 'Compared to the entire Data set',
data: [
['$1 Million to $5 Million', 36],
['$5 Million to $10 Million', 13],
['$10 Million to $15 Million', 15]
]
}]
});
This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Bernoulli Distribution of the Budget' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column Chart'. This will be done using Javascript and HTML .
%%js
Highcharts.chart('container1', {
chart: {
type: 'column',
},
title: {
text: 'The Bernoulli Distribution on <br>the Sub-groups in the Mid-Budget Category'
},
subtitle: {
text: 'R-rated Drama Movies'
},
xAxis: {
type: 'category',
labels: {
style: {
fontSize: '13px',
fontFamily: 'Verdana, sans-serif'
}
}
},
yAxis: {
min: 0,
title: {
text: 'Probability (%)'
}
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
plotOptions: {
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.0f}%'
}
}
},
tooltip: {
pointFormat: 'Percentage: <b>{point.y:.0f} %</b>'
},
series: [{
color: '#e8946a',
name: 'Compared to the Sub-groups within the Mid-Budget category',
data: [
['$15 Million to $20 Million', 30],
['$20 Million to $30 Million', 40],
['$30 Million to $50 Million', 30]
]
}, {
color: '#ba450b',
name: 'Compared to the entire Data set',
data: [
['$15 Million to $20 Million', 6],
['$20 Million to $30 Million', 7],
['$30 Million to $50 Million', 6]
]
}]
});
This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Bernoulli Distribution of the Budget' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column Chart'. This will be done using Javascript and HTML .
%%js
Highcharts.chart('container2', {
chart: {
type: 'column',
},
title: {
text: 'The Bernoulli Distribution on <br>the Sub-groups in the High-Budget Category'
},
subtitle: {
text: 'R-rated Drama Movies'
},
xAxis: {
type: 'category',
labels: {
style: {
fontSize: '13px',
fontFamily: 'Verdana, sans-serif'
}
}
},
yAxis: {
min: 0,
title: {
text: 'Probability (%)'
}
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
plotOptions: {
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.0f}%'
}
}
},
tooltip: {
pointFormat: 'Percentage: <b>{point.y:.0f} %</b>'
},
series: [{
color: '#f0c3ad',
name: 'Compared to the Sub-groups within the High-Budget category',
data: [
['$50 Million to $60 Million', 72],
['$60 Million to $70 Million', 14],
['$90 Million to $100 Million', 14]
]
}, {
color: '#ba450b',
name: 'Compared to the entire Data set',
data: [
['$50 Million to $60 Million', 9],
['$60 Million to $70 Million', 2],
['$90 Million to $100 Million', 2]
]
}]
});
Styling the first portion of the Frequency Distribution Table of the all the expenese spent on all of the R-rated movies.
freq_dis_cost_r = freq_dis[:8].style.hide_index()\
.set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
{"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
("font-size" , "12pt")]},#headinig
{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
{'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},
])
Saving the freq_dis_cost_r dataframe to the freq_dis_cost_r.png file as an image to be used for the analysis later on.
dfi.export(freq_dis_cost_r, 'freq_dis_cost_r.png')
The 'freq_dis_cost_r' datarame.
Styling the second portion of the Frequency Distribution Table of the all the expenese spent on all of the R-rated movies.
freq1_dis_cost_r = freq_dis[8:].style.hide_index()\
.set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
{"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
("font-size" , "12pt")]},#headinig
{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
{'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\885110614.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")` freq1_dis_cost_r = freq_dis[8:].style.hide_index()\
Saving the freq1_dis_cost_r dataframe to the freq1_dis_cost_r.png file as an image to be used for the analysis later on.
dfi.export(freq1_dis_cost_r, 'freq1_dis_cost_r.png')
The 'freq1_dis_cost_r' datarame.
Styling the Cumulative Frequency Distribution Table of the all the expenese spent on all of the R-rated movies.
freq_cum_dis_cost_r = freq_cum_dis.style.hide_index()\
.set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
{"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
("font-size" , "12pt")]},#headinig
{'selector':"td", "props":[("background-color","white"), ("color"," black"),
("font-size", "10pt")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
{'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]},])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\4059934024.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")` freq_cum_dis_cost_r = freq_cum_dis.style.hide_index()\
Saving the freq_cum_dis_cost_r dataframe to the freq_cum_dis_cost_r.png file as an image to be used for the analysis later on.
dfi.export(freq_cum_dis_cost_r, 'freq_cum_dis_cost_r.png')
The 'freq_cum_dis_cost_r' datarame.
Styling the Cumelative Relative Frequency Distribution Table of the all the expenese spent on all of the R-rated movies.
cum_rel_freq_cost_r = cum_rel_freq.style.hide_index()\
.set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
{"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
("font-size" , "12pt")]},#headinig
{'selector':"td", "props":[("background-color","white"), ("color"," black"),
("font-size", "10pt")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
{'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]}, ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\348482533.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")` cum_rel_freq_cost_r = cum_rel_freq.style.hide_index()\
Saving the cum_rel_freq_cost_r dataframe to the cum_rel_freq_cost_r.png file as an image to be used for the analysis later on.
dfi.export(cum_rel_freq_cost_r, 'cum_rel_freq_cost_r.png')
The 'cum_rel_freq_cost_r' datarame.
Cumelative Relative Frequency Distribution Line Plot of the all the expenese spent on all of the R-rated movies.
# Set up the axes and figure
fig, ax = plt.subplots()
amount = [10000000, 20000000, 30000000, 40000000, 50000000, 60000000, 70000000, 80000000,
90000000, 100000000]
freq = [53, 69, 82, 85, 87, 94, 98, 98, 98, 100]
x = ['$10M to < $20M','$30M to < $40M','$50M to < $60M','$70M to < $80M','>= $100M']
plt.plot( amount, freq ,color='#ff5500', marker='o')
plt.title('Cumulative relative frequency (%) of \n the amount of budget spent on R-rated movies', fontsize=14)
plt.xlabel('Amount of Budget', fontsize=14)
plt.ylabel('Cumulative relative frequency (%)', fontsize=14)
plt.grid(True)
#plt.xticks(x, rotation = 45)
plt.subplots_adjust(bottom=spacing)
plt.show()
Getting the ROI generated by all of the R-rated movies.
roi = []
for i in system_rating_r['Return On Investment']:
i = int(i.replace('$', '').replace(',', ''))
roi.append(i)
print(roi)#showing the roi list
[349948323, 307567189, 24154026, 326398492, 316350619, 19966854, 82112435, 530998101, 13147416, 129558438, 54735925, 9898681, 8554727, 17017873, 26604054, 8270399, 318266710, 25358392, 23262783, 7859167, 23830713, 31043521, 45178935, 60133905, 12417298, 69233867, 3765283, 12499242, 12636004, 222016, 53273049, 36954520, 17033227, 35669037, 20251930, 14610760, 14131551, 9295324, 8153415, 88390, 4328516, 19282640, 12744931, 15566240, 4438911, 156309, 294448, 2669782, 48766923, 68711836, 14718173, 1851683, 556082, 1500000, 2000000]
Checking the number of elements in the 'roi' list.
len(roi)
55
Putting the roi of all the R-rated movies into a dtaframe called df_roi_r.
df_roi_r = pd.DataFrame({"ROI":roi})
The 'df_roi_r' dataframe. (this dataframe is interactive)
df_roi_r
| ROI |
|---|
| Loading... (need help?) |
Getting the Arithmetic Mean of the all the ROI generated of all of the R-rated movies.
x = statistics.mean(roi)
print("Arithmetic Mean of the ROI for the R-rated movies is:", x)
Arithmetic Mean of the ROI for the R-rated movies is: 59600710.27272727
Getting the Median of the all the ROI generated of all of the R-rated movies.
print("Median of the ROI for the R-rated movies is:", statistics.median(roi))
Median of the ROI for the R-rated movies is: 17017873
Getting the Mode of the all the ROI generated of all of the R-rated movies.
print("Mode of the ROI for the R-rated movies is:",statistics.mode(roi))
Mode of the ROI for the R-rated movies is: 349948323
Getting the Standard Deviation of the all the ROI generated of all of the R-rated movies.
print("Standard deviation of the ROI for the R-rated movies is:", np.std(roi, ddof=1))
Standard deviation of the ROI for the R-rated movies is: 111311472.60911952
Getting the Coefficient of Variation of the all the ROI generated of all of the R-rated movies.
cv = lambda x: np.std(x, ddof=1) / np.mean(x) * 100
print("Coefficient of Variation of the ROI for the R-rated movies is:", cv(roi))
Coefficient of Variation of the ROI for the R-rated movies is: 186.76199008328697
Getting the Pearson’s Coefficient of the all the ROI generated of all of the R-rated movies.
def pearsons(mean, median, standard_deviation):
skewness = (mean-median)*3/standard_deviation
return skewness
print("Pearson’s Coefficient of Skewness of the ROI for the R-rated movies is:",
pearsons( statistics.mean(roi),statistics.median(roi),np.std(roi, ddof=1)))
Pearson’s Coefficient of Skewness of the ROI for the R-rated movies is: 1.147667071720293
Getting the Chebyshevs Theroem of the all the ROI generated on all of the R-rated movies.
def chebyshevs(mean, standard_deviation, num_std, previous_p):
position_std = num_std*standard_deviation
upper_range = mean - position_std
if upper_range < 0: upper_range = 0
lower_range = position_std + mean
if num_std == 2:
print('At least 75% of the ROI of the r-rated movies ranges from',upper_range,'to',lower_range)
if num_std == 3:
print('At least 13.9% of the ROI of the r-rated movies ranges from',previous_p,'to',lower_range)
chebyshevs(59600710, 111311472, 2, 0)
chebyshevs(59600710, 111311472, 3, 282223654)
At least 75% of the ROI of the r-rated movies ranges from 0 to 282223654 At least 13.9% of the ROI of the r-rated movies ranges from 282223654 to 393535126
Getting the Kurtosis of the all the ROI generated on all of the R-rated movies.
print('Kurtosis of the ROI of the r-rated movies is:',kurtosis(roi, fisher=False))
print('Excess Kurtosis of the ROI of the r-rated movies is:',
(kurtosis(roi,fisher=False)-3))#leptokurtic
Kurtosis of the ROI of the r-rated movies is: 9.143154378370438 Excess Kurtosis of the ROI of the r-rated movies is: 6.143154378370438
Getting the Arithmetic Mean and the Trimmed Mean of the all the ROI generated on all of the R-rated movies.
print("Arithmetic Mean of the ROI for the R-rated movies is:", statistics.mean(roi))
print('10% Trimmed mean of the ROI of the r-rated movies is:',stats.trim_mean(roi, 0.10))
Arithmetic Mean of the ROI for the R-rated movies is: 59600710.27272727 10% Trimmed mean of the ROI of the r-rated movies is: 31883546.111111112
Rounding the ROI generated of R-rated movies to the nearest million and storing it in a list called 'freq_demo'.
freq_demo = []
for i in roi:
freq_demo.append((round(i, -6)))
print(freq_demo)#showing the freq_demo list
[350000000, 308000000, 24000000, 326000000, 316000000, 20000000, 82000000, 531000000, 13000000, 130000000, 55000000, 10000000, 9000000, 17000000, 27000000, 8000000, 318000000, 25000000, 23000000, 8000000, 24000000, 31000000, 45000000, 60000000, 12000000, 69000000, 4000000, 12000000, 13000000, 0, 53000000, 37000000, 17000000, 36000000, 20000000, 15000000, 14000000, 9000000, 8000000, 0, 4000000, 19000000, 13000000, 16000000, 4000000, 0, 0, 3000000, 49000000, 69000000, 15000000, 2000000, 1000000, 2000000, 2000000]
Checking the number of elements in the 'freq_demo' list.
len(freq_demo)
55
Replacing some elements in the 'freq_demo' list with another value.
freq_demo[-9] = 300000
freq_demo[-10] = 200000
freq_demo[-16] = 100000
freq_demo[-26] = 200000
freq_demo[1] = 300000000
freq_demo[4] = 300000000
freq_demo[3] = 350000000
freq_demo[16] = 300000000
Getting the Frequency of the Repeated Values of all the ROI generated of the R-rated Drama movies. Which will be stored in a dictionary called 'freq_demo1'.
freq_demo1 = Counter((freq_demo))
print(freq_demo1)#showing the freq_demo1 list
Counter({300000000: 3, 13000000: 3, 8000000: 3, 4000000: 3, 2000000: 3, 350000000: 2, 24000000: 2, 20000000: 2, 9000000: 2, 17000000: 2, 12000000: 2, 69000000: 2, 200000: 2, 15000000: 2, 82000000: 1, 531000000: 1, 130000000: 1, 55000000: 1, 10000000: 1, 27000000: 1, 25000000: 1, 23000000: 1, 31000000: 1, 45000000: 1, 60000000: 1, 53000000: 1, 37000000: 1, 36000000: 1, 14000000: 1, 100000: 1, 19000000: 1, 16000000: 1, 300000: 1, 3000000: 1, 49000000: 1, 1000000: 1})
Sorting the 'freq_demo1' dictionary in accending order.
freq_one = sorted(freq_demo1.items(), key=lambda i: i[0])
print(freq_one)#showing the freq_one list
[(100000, 1), (200000, 2), (300000, 1), (1000000, 1), (2000000, 3), (3000000, 1), (4000000, 3), (8000000, 3), (9000000, 2), (10000000, 1), (12000000, 2), (13000000, 3), (14000000, 1), (15000000, 2), (16000000, 1), (17000000, 2), (19000000, 1), (20000000, 2), (23000000, 1), (24000000, 2), (25000000, 1), (27000000, 1), (31000000, 1), (36000000, 1), (37000000, 1), (45000000, 1), (49000000, 1), (53000000, 1), (55000000, 1), (60000000, 1), (69000000, 2), (82000000, 1), (130000000, 1), (300000000, 3), (350000000, 2), (531000000, 1)]
Creating a list called 'roi_freq_amount' with the frequency of the values from 'freq_one' list.
roi_freq_amount = []
for i in freq_one:
roi_freq_amount.append(i[1])
print(roi_freq_amount)#showing the roi_freq_amount list
[1, 2, 1, 1, 3, 1, 3, 3, 2, 1, 2, 3, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 3, 2, 1]
Checking the number of elements in the 'roi_freq_amount' list.
len(roi_freq_amount)
36
Creating a list called 'roi_freq' with the cost of the R-rated Dram movies in 'freq_one' list.
roi_freq = []
for i in freq_one:
roi_freq.append("${:,.0f}".format(i[0]))
print(roi_freq)#showing the roi_freq list
['$100,000', '$200,000', '$300,000', '$1,000,000', '$2,000,000', '$3,000,000', '$4,000,000', '$8,000,000', '$9,000,000', '$10,000,000', '$12,000,000', '$13,000,000', '$14,000,000', '$15,000,000', '$16,000,000', '$17,000,000', '$19,000,000', '$20,000,000', '$23,000,000', '$24,000,000', '$25,000,000', '$27,000,000', '$31,000,000', '$36,000,000', '$37,000,000', '$45,000,000', '$49,000,000', '$53,000,000', '$55,000,000', '$60,000,000', '$69,000,000', '$82,000,000', '$130,000,000', '$300,000,000', '$350,000,000', '$531,000,000']
Checking the number of elements in the 'roi_freq' list.
len(roi_freq)
36
Creating a Frequency Distribution Table called 'freq_dis', of all the ROI generated on all of the R-rated movies.
freq_dis_roi = pd.DataFrame({"Return On Investment (x)":roi_freq,
"Frequency (f)":roi_freq_amount})
The 'freq_dis' dataframe. (this dataframe is interactive)
freq_dis_roi
| Return On Investment (x) | Frequency (f) |
|---|---|
| Loading... (need help?) |
Getting the Upper Values and Lower Values of all the ROI generated on all of the R-rated movies, for the Cumulative Frequency Distribution Table.
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
a =list(chunks(range(70000, 350000000), 30000000))
a#showing the a list
[range(70000, 30070000), range(30070000, 60070000), range(60070000, 90070000), range(90070000, 120070000), range(120070000, 150070000), range(150070000, 180070000), range(180070000, 210070000), range(210070000, 240070000), range(240070000, 270070000), range(270070000, 300070000), range(300070000, 330070000), range(330070000, 350000000)]
vals = [70000, 30070000, 30070001, 60070001, 60070002, 90070002, 90070003, 120070003,
120070004, 150070004, 150070005, 180070005, 180070006, 210070006, 210070007,240070007, 240070008,
270070008, 270000009, 300070009, 300070010, 330070010, 330070011, 360070011, 360070012,
390070012, 390070013, 410070013, 410070014, 440007014, 440007015, 470007015, 470007016,
500070016,500070017, 530070017, 530070018, 560070019 ]
Finalizing the Lower Values for the Cumulative Frequency Distribution Table.
lower_vals = []
for i,x in enumerate(vals):
if (i%2) == 0:lower_vals.append("${:,.0f}".format(x))
print(lower_vals)#showing the lower_vals list
['$70,000', '$30,070,001', '$60,070,002', '$90,070,003', '$120,070,004', '$150,070,005', '$180,070,006', '$210,070,007', '$240,070,008', '$270,000,009', '$300,070,010', '$330,070,011', '$360,070,012', '$390,070,013', '$410,070,014', '$440,007,015', '$470,007,016', '$500,070,017', '$530,070,018']
Checking the number of elements in the 'lower_vals' list.
len(lower_vals)
19
Finalizing the Upper Values for the Cumulative Frequency Distribution Table.
upper_vals = []
for i,x in enumerate(vals):
if (i%2) !=0: upper_vals.append("${:,.0f}".format(x))
print(upper_vals)#showing the upper_vals list
['$30,070,000', '$60,070,001', '$90,070,002', '$120,070,003', '$150,070,004', '$180,070,005', '$210,070,006', '$240,070,007', '$270,070,008', '$300,070,009', '$330,070,010', '$360,070,011', '$390,070,012', '$410,070,013', '$440,007,014', '$470,007,015', '$500,070,016', '$530,070,017', '$560,070,019']
Checking the number of elements in the 'upper_vals' list.
len(upper_vals)
19
Getting the Frequency Amount of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
count10 = 0
count11 = 0
count12 = 0
count13 = 0
count14 = 0
count15 = 0
count16 = 0
count17 = 0
count18 = 0
count19 = 0
for i in roi:
if 70000 <= i < 30070000:
count1+=1
if 30070001 <= i < 60070001:
count2+=1
if 60070002 <= i <= 90070002:
count3+=1
if 90070003 <= i <= 120070003:
count4+=1
if 120070004 <= i <= 150070004:
count5+=1
if 150070005 <= i <= 180070005:
count6+=1
if 180070006 <= i <= 210070006:
count7+=1
if 210070007 <= i <= 240070007:
count8+=1
if 240070008 <= i <= 270070008:
count9+=1
if 270070008 <= i <= 300070008:
count10+=1
if 300070008 <= i <= 330070008:
count11+=1
if 330070008 <= i <= 360070008:
count12+=1
if 360070009 <= i <= 390070009:
count13+=1
if 390070010 <= i <= 410070010:
count14+=1
if 410070011 <= i <= 440070011:
count15+=1
if 440070012 <= i <= 470070012:
count16+=1
if 470070013 <= i <= 500070013:
count17+=1
if 530070014 <= i <= 530070015:
count18+=1
if 530070016 <= i <= 570070017:
count19+=1
freq_amount = [count1,count2,count3,count4,count5,count6,count7,
count8,count9,count10,count11,count12,
count13,count14,count15,count16,count17,count18,count19]
print(freq_amount)#showing the freq_amount list
[37, 7, 4, 0, 1, 0, 0, 0, 0, 0, 4, 1, 0, 0, 0, 0, 0, 0, 1]
Checking the number of elements in the 'freq_amount' list.
len(freq_amount)
19
Getting the Frequency Percentage of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.
freq_amount_percent_demo = [count1/55*100,count2/55*100,count3/55*100,count4/55*100,
count5/55*100,count6/55*100,count7/55*100,count8/55*100,
count9/55*100, count10/55*100, count11/55*100, count12/55*100,
count13/55*100,count14/55*100,count15/55*100,count16/55*100,
count17/55*100,count18/55*100,count19/55*100,]
freq_amount_percent_demo1 = [67, 13, 8, 0, 2, 0, 0, 0, 0, 0, 7, 2,
0, 0, 0, 0, 0, 0, 2]
print(freq_amount_percent_demo1)#showing the freq_amount_percent_demo1 list
[67, 13, 8, 0, 2, 0, 0, 0, 0, 0, 7, 2, 0, 0, 0, 0, 0, 0, 2]
Checking the number of elements in the 'freq_amount_percent_demo1' list.
len(freq_amount_percent_demo1)
19
Turning the integer in the freq_amount_percent_demo1 list into a string with the percentage symbol.
freq_amount_percent = []
for i in freq_amount_percent_demo1:
freq_amount_percent.append("{:}%".format(i))
print(freq_amount_percent)#showing the freq_amount_percent list
['67%', '13%', '8%', '0%', '2%', '0%', '0%', '0%', '0%', '0%', '7%', '2%', '0%', '0%', '0%', '0%', '0%', '0%', '2%']
Checking the number of elements in the 'freq_amount_percent' list.
len(freq_amount_percent)
19
Getting the Cumulative Frequency Amount of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.
freq_cumulative_amount = Cumulative(freq_amount)
print(freq_cumulative_amount)#showing the freq_cumulative_amount list
[37, 44, 48, 48, 49, 49, 49, 49, 49, 49, 53, 54, 54, 54, 54, 54, 54, 54, 55]
Checking the number of elements in the 'freq_cumulative_amount' list.
len(freq_cumulative_amount)
19
Getting the Cumulative Frequency Percentage of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.
freq_cumulative_percent_demo = Cumulative(freq_amount_percent_demo1)
print(freq_cumulative_percent_demo)#showing the freq_cumulative_percent_demo list
[67, 80, 88, 88, 90, 90, 90, 90, 90, 90, 97, 99, 99, 99, 99, 99, 99, 99, 101]
Checking the number of elements in the 'freq_cumulative_percent_demo' list.
len(freq_cumulative_percent_demo)
19
Turning the integer in the freq_cumulative_percent_demo list into a string with the percentage symbol.
freq_cumulative_percent = []
for i in freq_cumulative_percent_demo:
freq_cumulative_percent.append("{:}%".format(i))
print(freq_cumulative_percent)#showing the freq_cumulative_percent list
['67%', '80%', '88%', '88%', '90%', '90%', '90%', '90%', '90%', '90%', '97%', '99%', '99%', '99%', '99%', '99%', '99%', '99%', '101%']
Checking the number of elements in the 'freq_cumulative_percent' list.
len(freq_cumulative_percent)
19
Creating the Cumulative Frequency Distribution Table of all the ROI generated of all the R-rated movies, uding the neccessary virables.
freq_cum_dis1 = pd.DataFrame({"Lower\nValue":lower_vals,
"Upper\nValue":upper_vals,
"Frequency (f)":freq_amount,
"Percentage (%)":freq_amount_percent,
"Cumulative\nFrequency":freq_cumulative_amount,
"Cumulative\nPercentage":freq_cumulative_percent})
The 'freq_cum_dis1' table. (this table is interactive)
freq_cum_dis1
| Lower Value | Upper Value | Frequency (f) | Percentage (%) | Cumulative Frequency | Cumulative Percentage |
|---|---|---|---|---|---|
| Loading... (need help?) |
Getting the Frequency Amount of the values inbetween the Intervals for the Cumulative Frequency Relative Distribution Table.
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
count10 = 0
count11 = 0
count12 = 0
count13 = 0
for i in roi:
if i < 30000000:
count1+=1
if 30000000 <= i < 60000000:
count2+=1
if 60000000 <= i < 90000000:
count3+=1
if 90000000 <= i < 120000000:
count4+=1
if 120000000 <= i < 150000000:
count5+=1
if 150000000 <= i < 180000000:
count6+=1
if 180000000 <= i < 210000000:
count7+=1
if 210000000 <= i < 240000000:
count8+=1
if 240000000 <= i < 270000000:
count9+=1
if 270000000 <= i < 300000000:
count10+=1
if 300000000 <= i <= 330000000:
count11+=1
if 330000000 <= i <= 360000000:
count12+=1
if i > 360000000:
count13+=1
freq_amount = [count1,count2,count3,count4,count5,count6,count7,
count8,count9,count10,count11,count12, count13]
print(freq_amount)#showing the freq_amount list
[37, 7, 4, 0, 1, 0, 0, 0, 0, 0, 4, 1, 1]
Checking the number of elements in the 'freq_amount' list.
len(freq_amount)
13
Getting the Frequency Percentage of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.
cum_rel_freq_demo = []
for i in freq_amount:cum_rel_freq_demo.append(i/55*100)
cum_rel_freq_demo1 = [67,13,7,0,2,0,0,0,0,0,7,2,2]
print(cum_rel_freq_demo1)#showing the cum_rel_freq_demo1 list
[67, 13, 7, 0, 2, 0, 0, 0, 0, 0, 7, 2, 2]
Checking the number of elements in the 'cum_rel_freq_demo1' list.
len(cum_rel_freq_demo1)
13
Getting the Cumulative Relative Frequency Percentage of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.
cum_rel_freq_demo2 = Cumulative(cum_rel_freq_demo1)
print(cum_rel_freq_demo2)#showing the cum_rel_freq_demo2 list
[67, 80, 87, 87, 89, 89, 89, 89, 89, 89, 96, 98, 100]
Checking the number of elements in the 'cum_rel_freq_demo2' list.
len(cum_rel_freq_demo2)
13
Turning the integer in the cum_rel_freq_demo2 list into a string with the percentage symbol.
cum_rel_freq_percent = []
for i in cum_rel_freq_demo2:
cum_rel_freq_percent.append("{:}%".format(i))
print(cum_rel_freq_percent)#showing the cum_rel_freq_percent list
['67%', '80%', '87%', '87%', '89%', '89%', '89%', '89%', '89%', '89%', '96%', '98%', '100%']
Checking the number of elements in the 'cum_rel_freq_percent' list.
len(cum_rel_freq_percent)
13
Getting the Cumulative Frequency of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.
freq_cumulative_amount = Cumulative(freq_amount)
print(freq_cumulative_amount)#showing the freq_cumulative_amount list
[37, 44, 48, 48, 49, 49, 49, 49, 49, 49, 53, 54, 55]
Checking the number of elements in the 'freq_cumulative_amount' list.
len(freq_cumulative_amount)
13
Finalizing the Intervals for the Cumulative Relative Frequency Distribution Table.
intervals_cum = [ '< $30 Million','30 to < $60 Million','60 to < $90 Million',
'90 to < $120 Million','120 < $150 Million',
'150 to < $180 Miilion', '180 to < $210 Miilion', '210 to < $240 Miilion',
'240 to < $270 Miilion',
'270 to < $300 Miilion','300 to < $330 Million','330 to < $360 Million','>= $360 Miilion']
print(intervals_cum)#showing the intervals_cum list
['< $30 Million', '30 to < $60 Million', '60 to < $90 Million', '90 to < $120 Million', '120 < $150 Million', '150 to < $180 Miilion', '180 to < $210 Miilion', '210 to < $240 Miilion', '240 to < $270 Miilion', '270 to < $300 Miilion', '300 to < $330 Million', '330 to < $360 Million', '>= $360 Miilion']
Checking the number of elements in the 'intervals_cum' list.
len(intervals_cum)
13
Creating the Cumulative Relative Frequency Distribution Table of all the ROI generated of all the R-rated movies, uding the neccessary virables.
cum_rel_freq1 = pd.DataFrame({"Return On Investment":intervals_cum,
"Frequency (f)":freq_amount,
"Cumulative Frequency":freq_cumulative_amount,
"Cumelative Relative Frequency Percentage":cum_rel_freq_percent,
})
The 'cum_rel_freq1' table. (this table is interactive)
cum_rel_freq1
| Return On Investment | Frequency (f) | Cumulative Frequency | Cumelative Relative Frequency Percentage |
|---|---|---|---|
| Loading... (need help?) |
Visualizing The Normal Distribution of the all the ROI generated on all of the R-rated movies.
means = '59,600,710'
std = '111,311,472'
def make_gauss(N, sig, mu):
return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))
def main():
ax = plt.figure().add_subplot(1,1,1)
x = np.arange(-350, 350)
m = [60]
s = [111]
c = ['#ff5500']
for sig, mu, color in zip(s, m, c):
gauss = make_gauss(1, sig, mu)(x)
ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
plt.xlim(-350, 350)
plt.ylim(0, .2)
plt.legend(fontsize=11)
plt.title('Variability of ROI of R-rated Movies\n Normal Distribution, Mean =16.4 million, StDev=21 million',fontsize=14)
plt.xlabel("ROI of R-rated Movies",fontsize=14)
plt.ylabel("Density",fontsize=14)
plt.grid(False)
plt.show()
if __name__ == '__main__':
main()
Visualizing The Variance of the all the ROI generated on all of the R-rated movies.
#plt.ylim(-1,2.1) # Setting y limits so the axis are consistent
#plt.figure(figsize=(8,5))
plt.ylabel('ROI of R-rated Movies',fontsize=14)
plt.xlabel('Ranking of Values',fontsize=14)
plt.title("The Variance of all the ROI\n of all the R-rated movies in the Drama genre",fontsize=14) # Setting the title
plt.scatter(x=df_roi_r.index, y=df_roi_r['ROI'], s=15, color='#ff5500'); # Plotting the scatter
plt.hlines(y=df_roi_r['ROI'].mean(), xmin=0, xmax=55, color='blue') # Mean line
plt.grid(False)
plt.show()# Telling matplotlib to show the chart
Visualizing The Variance using Two Standard Deviation of the all the ROI generated on all of the R-rated movies.
#plt.ylim(-1,2.1) # Setting y limits so the axis are consistent
#plt.figure(figsize=(8,5))
plt.ylabel('ROI of R-rated Movies',fontsize=14)
plt.xlabel('Position of Values',fontsize=14)
plt.title("The Variance of all the ROI\n of all the R-rated movies in the Drama genre",fontsize=14) # Setting the title
plt.scatter(x=df_roi_r.index, y=df_roi_r['ROI'], s=15, color='#ff5500'); # Plotting the scatter
plt.hlines(y=df_roi_r['ROI'].mean(), xmin=0, xmax=55, color='blue') # Mean line
for std_int in [-2, -1, 1, 2]: # Going through different stds from the mean
standard_deviation = df_roi_r['ROI'].mean() + df_roi_r['ROI'].std()*std_int
plt.hlines(y=standard_deviation,
xmin=0,
xmax=55,
linestyles='dashed',
colors='green'); # 1 std above
# Giving labels to the lines we just drew
plt.text(y=standard_deviation + 2, x=-10, s=std_int, ha='center')
plt.grid(False)
Visualizing The Pearson’s Coefficient of Skewness of the all the ROI generated on all of the R-rated movies.
import matplotlib.pyplot as plt
# An "interface" to matplotlib.axes.Axes.hist() method
n, bins, patches = plt.hist(x=roi, bins='auto', color='#ff5500',
alpha=0.7, rwidth=0.85)
plt.grid(False)
plt.grid(axis='y', alpha=0.75)
plt.xlabel('ROI of R-rated Movies',fontsize=14)
plt.ylabel('Frequency',fontsize=14)
#plt.text( x=np.min(cost), y=0.1, s=r'$\mu=16 million, b=20 million$')
plt.title('The Pearson’s Coefficient of Skewness for the ROI\n of all R-rated movies is 1.14 (n=55)',fontsize=14)
Text(0.5, 1.0, 'The Pearson’s Coefficient of Skewness for the ROI\n of all R-rated movies is 1.14 (n=55)')
Visualizing The Comparison of Mode, Median and Mean of the all the ROI generated on all of the R-rated movies.
# An "interface" to matplotlib.axes.Axes.hist() method
median_roi = statistics.median(roi)
mean_roi = 59600710
mode_roi = statistics.mode(roi)
n, bins, patches = plt.hist(x=roi, bins='auto', color='#ff5500',
alpha=0.2, rwidth=0.85)
plt.grid(axis='y', alpha=0.75)
names = ["median", "mean", "mode"]
colors = ['green', 'red', 'blue']
measurements = [median_roi, mean_roi, mode_roi]
for measurement, name, color in zip(measurements, names, colors):
plt.axvline(x=measurement, linestyle='--', linewidth=2.5, label='{0} at {1}'.format(name, measurement), c=color)
plt.legend(fontsize=10);
plt.title('Comparison of Mode, Median and Mean \nin the Distribution of the ROI of all the R-rated Drama movies',fontsize=14)
Text(0.5, 1.0, 'Comparison of Mode, Median and Mean \nin the Distribution of the ROI of all the R-rated Drama movies')
Visualizing The Chebyshevs Theorem of the all the ROI generated on all of the R-rated movies.
means = '59,600,710'
std = '111,311,472'
means1 = 60
std1 = 111
def make_gauss(N, sig, mu):
return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))
def main():
ax = plt.figure().add_subplot(1,1,1)
x = np.arange(-400, 400)
s = [111]
m = [60]
c = ['#ff5500']
for sig, mu, color in zip(s, m, c):
gauss = make_gauss(1, sig, mu)(x)
ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
x = np.linspace(means1 - std1*2, means1 + std1*2)
y = norm.pdf(x, means1, std1)
ax.fill_between(x, y, alpha=0.5, color='#ff5500')
ax.annotate('at least 75%\n at least 41 obs', xy=(250,0.0035), xytext=(250,0.0020),
arrowprops={'arrowstyle': '-|>'}, va='center', color='black',fontsize=11)
plt.xlim(-400, 400)
plt.ylim(0, .004)
plt.legend(fontsize=10)
plt.title('Chebyshevs Theorem on the ROI \nof the R-rated Movies in the Drama Genre (n=55)',fontsize=14)
plt.xlabel("ROI of R-rated Movies",fontsize=14)
plt.ylabel("Density", fontsize=14)
plt.show()
if __name__ == '__main__':
main()
Visualizing The Chebyshevs Theorem of the all the ROI generated on all of the R-rated movies.
means = '59,600,710'
std = '111,311,472'
means1 = 60
std1 = 111
def make_gauss(N, sig, mu):
return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))
def main():
ax = plt.figure().add_subplot(1,1,1)
x = np.arange(-400, 400)
s = [111]
m = [60]
c = ['#ff5500']
for sig, mu, color in zip(s, m, c):
gauss = make_gauss(1, sig, mu)(x)
ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
x = np.linspace(280, 350)
y = norm.pdf(x, means1, std1)
ax.fill_between(x, y, alpha=0.5, color='#ff5500')
x1 = np.linspace(-170, -250)
y = norm.pdf(x, means1, std1)
ax.fill_between(x1, y, alpha=0.5, color='#ff5500')
ax.annotate('at least 13.9%\n at leat 8 obs',xy=(250,0.0035), xytext=(250,0.0020),
arrowprops={'arrowstyle': '-|>'}, va='center', color='black',fontsize=11)
plt.xlim(-400, 400)
plt.ylim(0, .004)
plt.legend(fontsize=10)
plt.title('Chebyshevs Theorem on the ROI \nof the R-rated Movies in the Drama Genre (n=55)',fontsize=14)
plt.xlabel("ROI of R-rated Movies",fontsize=14)
plt.ylabel("Density",fontsize=14)
plt.show()
if __name__ == '__main__':
main()
Visualizing The KDE and Jittered plot of the all the ROI generated on all of the R-rated movies.
import seaborn as sns
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.stripplot(data=df_roi_r, color='#ff5500');
sns.violinplot( data=df_roi_r,inner=None,color='0.8').set(title='KDE and Jittered strip plot\n on the ROI of the r-rated movies')
plt.show()
Visualizing The KDE and Swarm plot of the all the ROI generated on all of the R-rated movies.
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.swarmplot(data=df_roi_r, color='#ff5500');
sns.violinplot( data=df_roi_r, color='0.8', inner=None, aplha=.2).set(title='KDE and swarm plot\n on the ROI of the r-rated movies')
#sns.despine()
plt.show()
Visualizing The KDE and Rug plot of the all the ROI generated on all of the R-rated movies.
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.stripplot(data=df_roi_r, color='#ff5500', jitter=False)
sns.violinplot(data=df_roi_r, split=True,inner=None,
scale="count", color='0.8', alpha=.1).set(title='KDE and rug plot\n on the ROI of the r-rated movies')
#sns.despine()
plt.show()
Styling the first portion of the Frequency Distribution Table of the all the ROI generated on all of the R-rated movies.
freq_dis_roi1 = freq_dis_roi[:12].style.hide_index()\
.set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
{"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
("font-size" , "12pt")]},#headinig
{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
{'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]}, ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_3504\1533880768.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")` freq_dis_roi1 = freq_dis_roi[:12].style.hide_index()\
Saving the freq_dis_roi1 dataframe to the freq_dis_roi1.png file as an image to be used for the analysis later on.
dfi.export(freq_dis_roi1, 'freq_dis_roi1.png')
The 'freq_dis_roi1' datarame.
Styling the second portion of the Frequency Distribution Table of the all the ROI generated on all of the R-rated movies.
freq_dis_roi2 = freq_dis_roi[12:24].style.hide_index()\
.set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
{"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
("font-size" , "12pt")]},#headinig
{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
{'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]}, ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_3504\2751053211.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")` freq_dis_roi2 = freq_dis_roi[12:24].style.hide_index()\
Saving the freq_dis_roi2 dataframe to the freq_dis_roi2.png file as an image to be used for the analysis later on.
dfi.export(freq_dis_roi2, 'freq_dis_roi2.png')
The 'freq_dis_roi2' datarame.
Styling the last portion of the Frequency Distribution Table of the all the ROI generated on all of the R-rated movies.
freq_dis_roi3 = freq_dis_roi[24:].style.hide_index()\
.set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
{"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
("font-size" , "12pt")]},#headinig
{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
{'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]}, ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_3504\3066803151.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")` freq_dis_roi3 = freq_dis_roi[24:].style.hide_index()\
Saving the freq_dis_roi3 dataframe to the freq_dis_roi3.png file as an image to be used for the analysis later on.
dfi.export(freq_dis_roi3, 'freq_dis_roi3.png')
The 'freq_dis_roi3' datarame.
Styling the Cumulative Frequency Distribution Table of the all the ROI generated on all of the R-rated movies.
freq_cum_dis11 = freq_cum_dis1.style.hide_index()\
.set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
{"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
("font-size" , "12pt")]},#headinig
{'selector':"td", "props":[("background-color","white"), ("color"," black"),
("font-size", "10pt")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
{'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]}, ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\3430443440.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")` freq_cum_dis11 = freq_cum_dis1.style.hide_index()\
Saving the freq_cum_dis11 dataframe to the freq_cum_dis11.png file as an image to be used for the analysis later on.
dfi.export(freq_cum_dis11, 'freq_cum_dis11.png')
The 'freq_cum_dis11' datarame.
Styling the Cumelative Relative Frequency Distribution Table of the all the ROI generated on all of the R-rated movies.
cum_rel_freq11 = cum_rel_freq1.style.hide_index()\
.set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
{"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
("font-size" , "12pt")]},#headinig
{'selector':"td", "props":[("background-color","white"), ("color"," black"),
("font-size", "10pt")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
{'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]}, ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_3504\906232283.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")` cum_rel_freq11 = cum_rel_freq1.style.hide_index()\
Saving the cum_rel_freq1 dataframe to the cum_rel_freq1.png file as an image to be used for the analysis later on.
dfi.export(cum_rel_freq11, 'cum_rel_freq11.png')
The 'cum_rel_freq1' datarame.
Cumelative Relative Frequency Distribution Line Plot of the all the ROI generated on all of the R-rated movies.
amount = [30000000, 60000000, 90000000, 120000000, 150000000, 180000000, 210000000, 240000000,
270000000, 300000000, 330000000, 360000000, 531000000]
freq = [67, 80, 87, 87, 89, 89, 89, 89, 89, 89, 96, 98, 100]
plt.plot( amount, freq ,color='red', marker='o')
plt.title('Cumulative relative frequency (%) of \n the ROI made on R-rated movies', fontsize=14)
plt.xlabel('Return On Investment', fontsize=14)
plt.ylabel('Cumulative relative frequency (%)', fontsize=14)
plt.grid(True)
plt.show()
Getting the ROI Percentage on all of the R-rated movies.
roi_per = []
for i in system_rating_r["ROI Percentage"]:
i = int(i.replace('%', ''))
roi_per.append(i)
print(roi_per) #showing the roi_per list
[350, 504, 40, 593, 575, 36, 156, 1327, 35, 418, 238, 44, 38, 81, 133, 41, 2448, 195, 179, 65, 199, 263, 411, 601, 132, 815, 54, 250, 258, 5, 1332, 1056, 501, 1081, 675, 731, 707, 465, 408, 4, 216, 970, 850, 1557, 444, 16, 218, 2670, 813, 808, 74, 1852, 21, 13, 22]
Checking the number of elements in the 'roi_per' list.
len(roi_per)
55
Putting the roi_per of all the R-rated movies into a dtaframe called df_roi_r.
df_roi_per_r = pd.DataFrame({"ROI Percentage":roi_per})
The 'df_roi_r' dataframe. (this dataframe is interactive)
df_roi_per_r
| ROI Percentage |
|---|
| Loading... (need help?) |
Getting the Arithmetic Mean of the all the ROI Percentage of all of the R-rated movies.
x = statistics.mean(roi_per)
print("Arithmetic Mean of the ROI for the R-rated movies is:", x)
Arithmetic Mean of the ROI for the R-rated movies is: 508.8727272727273
Getting the Median of the all the ROI Percentage of all of the R-rated movies.
print("Median of the ROI for the R-rated movies is:", statistics.median(roi_per))
Median of the ROI for the R-rated movies is: 263
Getting the Standard Deviation of the all the ROI Percentage of all of the R-rated movies.
print("Standard deviation of the ROI Percentage for the R-rated movies is:", np.std(roi_per, ddof=1))
Standard deviation of the ROI Percentage for the R-rated movies is: 590.4479797873956
Getting the Coefficient of Variation of the all the ROI Percentage of all of the R-rated movies.
cv = lambda x: np.std(x, ddof=1) / np.mean(x) * 100
print("Coefficient of Variation of the ROI Percentage for the R-rated movies is:", cv(roi_per))
Coefficient of Variation of the ROI Percentage for the R-rated movies is: 116.03058056419451
Getting the Pearson’s Coefficient of Skewness of the all the ROI Percentage of all of the R-rated movies.
def pearsons(mean, median, standard_deviation):
skewness = (mean-median)*3/standard_deviation
return skewness
print("Pearson’s Coefficient of Skewness of the ROI Percentage for the R-rated movies is:",
pearsons( statistics.mean(roi_per),statistics.median(roi_per),np.std(roi_per, ddof=1)))
Pearson’s Coefficient of Skewness of the ROI Percentage for the R-rated movies is: 1.2492517665718463
Getting the Chebyshevs Theroem of the all the ROI Percentage on all of the R-rated movies.
def chebyshevs(mean, standard_deviation, num_std, previous_p):
position_std = num_std*standard_deviation
upper_range = mean - position_std
if upper_range < 0: upper_range = 0
lower_range = position_std + mean
if num_std == 2:
print('At least 75% of the ROI Percentage of the r-rated movies ranges from',upper_range,'to',lower_range)
if num_std == 3:
print('At least 13.9% of the ROI Percentage of the r-rated movies ranges from',previous_p,'to',lower_range)
chebyshevs(510, 590, 2, 0)
chebyshevs(510, 590, 3, 1690)
At least 75% of the ROI Percentage of the r-rated movies ranges from 0 to 1690 At least 13.9% of the ROI Percentage of the r-rated movies ranges from 1690 to 2280
Getting the Kurtosis of the all the ROI Percentage on all of the R-rated movies.
print('Kurtosis of the ROI Percentage of the r-rated movies is:',kurtosis(roi_per, fisher=False))
print('Excess Kurtosis of the ROI Percentage of the r-rated movies is:',
(kurtosis(roi_per,fisher=False)-3))#leptokurtic
Kurtosis of the ROI Percentage of the r-rated movies is: 6.600168491255548 Excess Kurtosis of the ROI Percentage of the r-rated movies is: 3.6001684912555483
Getting the Arithmetic Mean and the Trimmed Mean of the all the ROI Percentage on all of the R-rated movies.
print("Arithmetic Mean of the ROI for the R-rated movies is:", statistics.mean(roi_per))
print('10% Trimmed mean of the ROI of the r-rated movies is:',stats.trim_mean(roi_per, 0.10))
Arithmetic Mean of the ROI for the R-rated movies is: 508.8727272727273 10% Trimmed mean of the ROI of the r-rated movies is: 401.55555555555554
Rounding the ROI Percentage of R-rated movies to the nearest million and storing it in a list called 'freq_demo'.
freq_demo = [5, 5, 10, 20, 20, 20, 40, 40, 40, 40, 40, 40, 50, 70, 70, 80, 130, 130,
150, 180, 200, 220, 240, 300, 300, 300, 400, 400, 400, 400, 500, 500, 500,
500, 600, 600, 600, 700, 700, 700, 800, 800, 800, 900, 1000, 1100, 1300,
1300, 2000, 2000, 2500, 3000]
print(freq_demo) #showing the freq_demo list
[5, 5, 10, 20, 20, 20, 40, 40, 40, 40, 40, 40, 50, 70, 70, 80, 130, 130, 150, 180, 200, 220, 240, 300, 300, 300, 400, 400, 400, 400, 500, 500, 500, 500, 600, 600, 600, 700, 700, 700, 800, 800, 800, 900, 1000, 1100, 1300, 1300, 2000, 2000, 2500, 3000]
Checking the number of elements in the 'freq_demo' list.
len(freq_demo)
52
Getting the Frequency of the Repeated Values of all the ROI Percentage of the R-rated Drama movies. Which will be stored in a dictionary called 'freq_demo1'.
freq_demo1 = Counter((freq_demo))
print(freq_demo1)#showing the freq_demo1 list
Counter({40: 6, 400: 4, 500: 4, 20: 3, 300: 3, 600: 3, 700: 3, 800: 3, 5: 2, 70: 2, 130: 2, 1300: 2, 2000: 2, 10: 1, 50: 1, 80: 1, 150: 1, 180: 1, 200: 1, 220: 1, 240: 1, 900: 1, 1000: 1, 1100: 1, 2500: 1, 3000: 1})
Sorting the 'freq_demo1' dictionary in accending order.
freq_one = sorted(freq_demo1.items(), key=lambda i: i[0])
print(freq_one)#showing the freq_one list
[(5, 2), (10, 1), (20, 3), (40, 6), (50, 1), (70, 2), (80, 1), (130, 2), (150, 1), (180, 1), (200, 1), (220, 1), (240, 1), (300, 3), (400, 4), (500, 4), (600, 3), (700, 3), (800, 3), (900, 1), (1000, 1), (1100, 1), (1300, 2), (2000, 2), (2500, 1), (3000, 1)]
Creating a list called 'freq_one' with the ROI Percentage of the R-rated Dram movies in 'freq_one' list.
roi_per_freq = []
for i in freq_one:
roi_per_freq.append("{:}%".format(i[0]))
print(roi_per_freq)#showing the roi_per_freq list
['5%', '10%', '20%', '40%', '50%', '70%', '80%', '130%', '150%', '180%', '200%', '220%', '240%', '300%', '400%', '500%', '600%', '700%', '800%', '900%', '1000%', '1100%', '1300%', '2000%', '2500%', '3000%']
Checking the number of elements in the 'roi_per_freq' list.
len(roi_per_freq)
26
Creating a list called 'roi_per_freq_amount' with the frequency of the values from 'freq_one' list.
roi_per_freq_amount = []
for i in freq_one:
roi_per_freq_amount.append(i[1])
print(roi_per_freq_amount)#showing the roi_per_freq_amount list
[2, 1, 3, 6, 1, 2, 1, 2, 1, 1, 1, 1, 1, 3, 4, 4, 3, 3, 3, 1, 1, 1, 2, 2, 1, 1]
Checking the number of elements in the 'roi_per_freq_amount' list.
len(roi_per_freq_amount)
26
Creating a Frequency Distribution Table called 'freq_dis', of all the ROI Percentage on all of the R-rated movies.
freq_dis3 = pd.DataFrame({"ROI\nPercentage (x)":roi_per_freq,
"Frequency (f)":roi_per_freq_amount})
The 'freq_dis' dataframe. (this dataframe is interactive)
freq_dis3
| ROI Percentage (x) | Frequency (f) |
|---|---|
| Loading... (need help?) |
Getting the Upper Values and Lower Values of all the ROI Percentage on all of the R-rated movies, for the Cumulative Frequency Distribution Table.
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
#a =list(chunks(range(0, 2700), 150))
a =list(chunks(range(0, 2800), 200))
a#showing the a list
[range(0, 200), range(200, 400), range(400, 600), range(600, 800), range(800, 1000), range(1000, 1200), range(1200, 1400), range(1400, 1600), range(1600, 1800), range(1800, 2000), range(2000, 2200), range(2200, 2400), range(2400, 2600), range(2600, 2800)]
Finalizing the Lower Values for the Cumulative Frequency Distribution Table.
lower_val = ['0%', '151%', '302%', '453%', '604%', '755%', '906%', '1057%', '1208%', '1359%',
'1510%', '1661%', '1812%', '1963%', '2114%', '2265%', '2416%', '2567%' ]
print(lower_val)#showing the lower_val list
['0%', '151%', '302%', '453%', '604%', '755%', '906%', '1057%', '1208%', '1359%', '1510%', '1661%', '1812%', '1963%', '2114%', '2265%', '2416%', '2567%']
Checking the number of elements in the 'lower_val' list.
len(lower_val)
18
Finalizing the Upper Values for the Cumulative Frequency Distribution Table.
upper_val = ['150%', '301%', '452%', '603%', '754%', '905%', '1056%', '1207%','1358%', '1509%',
'1660%', '1811%', '1962%', '2113%', '2264%', '2415%', '2566%', '2717%']
print(upper_val)#showing the upper_val list
['150%', '301%', '452%', '603%', '754%', '905%', '1056%', '1207%', '1358%', '1509%', '1660%', '1811%', '1962%', '2113%', '2264%', '2415%', '2566%', '2717%']
Checking the number of elements in the 'upper_val' list.
len(upper_val)
18
Getting the Frequency Amount of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
count10 = 0
count11 = 0
count12 = 0
count13 = 0
count14 = 0
count15 = 0
count16 = 0
count17 = 0
count18 = 0
for i in roi_per:
if 0 <= i <= 150:
count1+=1
if 151 <= i <= 301:
count2+=1
if 302 <= i <= 452:
count3+=1
if 453 <= i <= 603:
count4+=1
if 604 <= i <= 754:
count5+=1
if 755 <= i <= 905:
count6+=1
if 906 <= i <= 1056:
count7+=1
if 1057 <= i <= 1207:
count8+=1
if 1208 <= i <= 1358:
count9+=1
if 1359 <= i <= 1509:
count10+=1
if 1510 <= i <= 1660:
count11+=1
if 1661 <= i <= 1811:
count12+=1
if 1812 <= i <= 1962:
count13+=1
if 1963 <= i <= 2113:
count14+=1
if 2114 <= i <= 2264:
count15+=1
if 2265 <= i <= 2415:
count16+=1
if 2416 <= i <= 2566:
count17+=1
if 2567 <= i <= 2717:
count18+=1
freq_amount = [count1,count2,count3,count4,count5,count6,count7,count8,count9,count10,
count11,count12,count13,count14,count15,count16,count17,count18]
print(freq_amount)#showing the freq_amount list
[18, 10, 5, 6, 3, 4, 2, 1, 2, 0, 1, 0, 1, 0, 0, 0, 1, 1]
Checking the number of elements in the 'freq_amount' list.
len(freq_amount)
18
Getting the Frequency Percentage of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.
freq_amount_percent_demo = [count1/55*100,count2/55*100,count3/55*100,count4/55*100,
count5/55*100,count6/55*100,count7/55*100,count8/55*100,
count9/55*100,count10/55*100,count11/55*100,count12/55*100,
count13/55*100,count14/55*100,count15/55*100,count16/55*100,
count17/55*100,count18/55*100,]
freq_amount_percent_demo1 = [33, 18, 9, 11, 5, 7, 3, 2, 4, 0, 2, 0, 2, 0, 0, 0, 2, 2]
print(freq_amount_percent_demo1)#showing the freq_amount_percent_demo1 list
[33, 18, 9, 11, 5, 7, 3, 2, 4, 0, 2, 0, 2, 0, 0, 0, 2, 2]
Checking the number of elements in the 'freq_amount_percent_demo1' list.
len(freq_amount_percent_demo1)
18
Turning the integer in the freq_amount_percent_demo1 list into a string with the percentage symbol.
freq_amount_percent = []
for i in freq_amount_percent_demo1:
freq_amount_percent.append("{:}%".format(i))
print(freq_amount_percent)#showing the freq_amount_percent list
['33%', '18%', '9%', '11%', '5%', '7%', '3%', '2%', '4%', '0%', '2%', '0%', '2%', '0%', '0%', '0%', '2%', '2%']
Checking the number of elements in the 'freq_amount_percent' list.
len(freq_amount_percent)
18
Getting the Cumulative Frequency Amount of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.
freq_cumulative_amount = Cumulative(freq_amount)
print(freq_cumulative_amount)#showing the freq_cumulative_amount list
[18, 28, 33, 39, 42, 46, 48, 49, 51, 51, 52, 52, 53, 53, 53, 53, 54, 55]
Checking the number of elements in the 'freq_cumulative_amount' list.
len(freq_cumulative_amount)
18
Getting the Cumulative Frequency Percentage of the values inbetween the Upper Values and Lower Values for the Cumulative Frequency Distribution Table.
freq_cumulative_percent_demo = Cumulative(freq_amount_percent_demo1)
print(freq_cumulative_percent_demo)#showing the freq_cumulative_percent_demo list
[33, 51, 60, 71, 76, 83, 86, 88, 92, 92, 94, 94, 96, 96, 96, 96, 98, 100]
Checking the number of elements in the 'freq_cumulative_percent_demo' list.
len(freq_cumulative_percent_demo)
18
Turning the integer in the freq_cumulative_percent_demo list into a string with the percentage symbol.
freq_cumulative_percent = []
for i in freq_cumulative_percent_demo:
freq_cumulative_percent.append("{:}%".format(i))
print(freq_cumulative_percent)#showing the freq_cumulative_percent list
['33%', '51%', '60%', '71%', '76%', '83%', '86%', '88%', '92%', '92%', '94%', '94%', '96%', '96%', '96%', '96%', '98%', '100%']
Checking the number of elements in the 'freq_cumulative_percent' list.
len(freq_cumulative_percent)
18
Creating the Cumulative Frequency Distribution Table of all the ROI Percentage of all the R-rated movies, uding the neccessary virables.
freq_cum_dis2 = pd.DataFrame({"Lower\nValue":lower_val,
"Upper\nValue":upper_val,
"Frequency (f)":freq_amount,
"Percentage (%)":freq_amount_percent,
"Cumulative\nFrequency":freq_cumulative_amount,
"Cumulative\nPercentage":freq_cumulative_percent})
The 'freq_cum_dis2' table. (this table is interactive)
freq_cum_dis2
| Lower Value | Upper Value | Frequency (f) | Percentage (%) | Cumulative Frequency | Cumulative Percentage |
|---|---|---|---|---|---|
| Loading... (need help?) |
Getting the Frequency Amount of the values inbetween the Intervals for the Cumulative Frequency Relative Distribution Table.
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
count10 = 0
count11 = 0
count12 = 0
count13 = 0
count14 = 0
for i in roi_per:
if i < 200:
count1+=1
if 200 <= i < 400:
count2+=1
if 400 <= i < 600:
count3+=1
if 600 <= i < 800:
count4+=1
if 800 <= i < 1000:
count5+=1
if 1000 <= i < 1200:
count6+=1
if 1200 <= i < 1400:
count7+=1
if 1400 <= i < 1600:
count8+=1
if 1600 <= i < 1800:
count9+=1
if 1800 <= i <= 2000:
count10+=1
if 2000 <= i < 2200:
count11+=1
if 2200 <= i < 2400:
count12+=1
if 2400 <= i < 2600:
count13+=1
if 2600 <= i <= 2800:
count14+=1
freq_amount = [count1,count2,count3,count4,count5,count6,count7,count8,count9,count10,
count11,count12,count13,count14]
print(freq_amount)#showing the freq_amount list
[22, 7, 9, 4, 5, 2, 2, 1, 0, 1, 0, 0, 1, 1]
Checking the number of elements in the 'freq_amount' list.
len(freq_amount)
14
Getting the Frequency Percentage of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.
cum_rel_freq_demo = []
for i in freq_amount:cum_rel_freq_demo.append(i/55*100)
cum_rel_freq_demo1 = [40,13,16,7,9,4,3,2,0,2,0,0,2,2]
print(cum_rel_freq_demo1)#showing the cum_rel_freq_demo1 list
[40, 13, 16, 7, 9, 4, 3, 2, 0, 2, 0, 0, 2, 2]
Checking the number of elements in the 'cum_rel_freq_demo1' list.
len(cum_rel_freq_demo1)
14
Getting the Cumulative Relative Frequency Percentage of the values inbetween the Intervals for the Cumulative Relative Frequency Distribution Table.
cum_rel_freq_demo2 = Cumulative(cum_rel_freq_demo1)
print(cum_rel_freq_demo2)#showing the cum_rel_freq_demo2 list
[40, 53, 69, 76, 85, 89, 92, 94, 94, 96, 96, 96, 98, 100]
Checking the number of elements in the 'cum_rel_freq_demo2' list.
len(cum_rel_freq_demo2)
14
Turning the integer in the cum_rel_freq_demo2 list into a string with the percentage symbol.
cum_rel_freq_percent = []
for i in cum_rel_freq_demo2:
cum_rel_freq_percent.append("{:}%".format(i))
print(cum_rel_freq_percent)#showing the cum_rel_freq_percent list
['40%', '53%', '69%', '76%', '85%', '89%', '92%', '94%', '94%', '96%', '96%', '96%', '98%', '100%']
Checking the number of elements in the 'cum_rel_freq_percent' list.
len(cum_rel_freq_percent)
14
Getting the Frequency Amount of the values inbetween the Intervals for the Cumulative Frequency Relative Distribution Table.
freq_cumulative_amount = Cumulative(freq_amount)
print(freq_cumulative_amount)#showing the freq_cumulative_amount list
[22, 29, 38, 42, 47, 49, 51, 52, 52, 53, 53, 53, 54, 55]
Checking the number of elements in the 'freq_cumulative_amount' list.
len(freq_cumulative_amount)
14
Finalizing the Intervals for the Cumulative Relative Frequency Distribution Table.
intervals_cum = [ '< 200%','200% to < 400%','400% to < 600%','600% to < 800%','800% < 1000%',
'1000% to < 1200%', '1200% to < 1400%', '1400% to < 1600%','1600% to < 1800%',
'1800% to < 2000%','2000% to < 2200%','2200% to < 2400%','2400% to < 2600%', '>= 2800%']
print(intervals_cum)#showing the intervals_cum list
['< 200%', '200% to < 400%', '400% to < 600%', '600% to < 800%', '800% < 1000%', '1000% to < 1200%', '1200% to < 1400%', '1400% to < 1600%', '1600% to < 1800%', '1800% to < 2000%', '2000% to < 2200%', '2200% to < 2400%', '2400% to < 2600%', '>= 2800%']
Checking the number of elements in the 'intervals_cum' list.
len(intervals_cum)
14
Creating the Cumulative Relative Frequency Distribution Table of all the ROI Percentage on all the R-rated movies, uding the neccessary virables.
cum_rel_freq2 = pd.DataFrame({"Return On Investment":intervals_cum,
"Frequency (f)":freq_amount,
"Cumulative Frequency":freq_cumulative_amount,
"Cumelative Relative Frequency Percentage":cum_rel_freq_percent,
})
The 'cum_rel_freq2' table. (this table is interactive)
cum_rel_freq2
| Return On Investment | Frequency (f) | Cumulative Frequency | Cumelative Relative Frequency Percentage |
|---|---|---|---|
| Loading... (need help?) |
Visualizing The Normal Distribution of the all the ROI Percentage on all of the R-rated movies.
means = '510%'
std = '590%'
def make_gauss(N, sig, mu):
return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))
def main():
ax = plt.figure().add_subplot(1,1,1)
x = np.arange(-40, 40)
s = [5.9]
m = [5.1]
c = ['#ff5500']
for sig, mu, color in zip(s, m, c):
gauss = make_gauss(1, sig, mu)(x)
ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
plt.xlim(-40, 40)
plt.ylim(0, .2)
plt.legend(fontsize=11)
plt.title('Variability of ROI Percentage of R-rated Movies\n Normal Distribution, Mean = 510%, StDev=590%',fontsize=14)
plt.xlabel("ROI Percentage of R-rated Movies",fontsize=14)
plt.ylabel("Density",fontsize=14)
plt.grid(False)
plt.show()
if __name__ == '__main__':
main()
Visualizing The Variance of the all the ROI Percentage on all of the R-rated movies.
#plt.ylim(-1,2.1) # Setting y limits so the axis are consistent
#plt.figure(figsize=(8,5))
plt.ylabel('ROI Percentage of R-rated Movies',fontsize=14)
plt.xlabel('Ranking of Values',fontsize=14)
plt.title("The Variance of all the ROI Percentage\n of all the R-rated movies in the Drama genre",fontsize=14) # Setting the title
plt.scatter(x=df_roi_r.index, y=df_roi_r['ROI Percentage'], s=15, color='#ff5500'); # Plotting the scatter
plt.hlines(y=df_roi_r['ROI Percentage'].mean(), xmin=0, xmax=55, color='blue') # Mean line
plt.grid(False)
plt.show()# Telling matplotlib to show the chart
Visualizing The Variance using Two Standard Deviation of the all the ROI Percentage on all of the R-rated movies.
#plt.ylim(-1,2.1) # Setting y limits so the axis are consistent
#plt.figure(figsize=(8,5))
plt.ylabel('ROI Perecntage of R-rated Movies',fontsize=14)
plt.xlabel('Position of Values',fontsize=14)
plt.title("The Variance of all the ROI Percentage\n of all the R-rated movies in the Drama genre",fontsize=14) # Setting the title
plt.scatter(x=df_roi_r.index, y=df_roi_r['ROI Percentage'], s=15, color='#ff5500'); # Plotting the scatter
plt.hlines(y=df_roi_r['ROI Percentage'].mean(), xmin=0, xmax=55, color='blue') # Mean line
for std_int in [-2, -1, 1, 2]: # Going through different stds from the mean
standard_deviation = df_roi_r['ROI Percentage'].mean() + df_roi_r['ROI Percentage'].std()*std_int
plt.hlines(y=standard_deviation,
xmin=0,
xmax=55,
linestyles='dashed',
colors='green'); # 1 std above
# Giving labels to the lines we just drew
plt.text(y=standard_deviation + 2, x=-10, s=std_int, ha='center')
plt.grid(False)
Visualizing The Pearson’s Coefficient of Skewness of the all the ROI Percentage on all of the R-rated movies.
import matplotlib.pyplot as plt
# An "interface" to matplotlib.axes.Axes.hist() method
n, bins, patches = plt.hist(x=roi_per, bins='auto', color='#ff5500',
alpha=0.7, rwidth=0.85)
plt.grid(False)
plt.grid(axis='y', alpha=0.75)
plt.xlabel('ROI Percentage of R-rated Movies',fontsize=14)
plt.ylabel('Frequency',fontsize=14)
#plt.text( x=np.min(cost), y=0.1, s=r'$\mu=16 million, b=20 million$')
plt.title('The Pearson’s Coefficient of Skewness for the ROI Percentage\n of all R-rated movies is 1.24 (n=55)',fontsize=14)
Text(0.5, 1.0, 'The Pearson’s Coefficient of Skewness for the ROI Percentage\n of all R-rated movies is 1.24 (n=55)')
Visualizing The Comparison of Mode, Median and Mean of the all the ROI Percentage on all of the R-rated movies.
# An "interface" to matplotlib.axes.Axes.hist() method
median_roi_per = statistics.median(roi_per)
mean_roi_per = 510
mode_roi_per = statistics.mode(roi_per)
n, bins, patches = plt.hist(x=roi_per, bins='auto', color='#ff5500',
alpha=0.2, rwidth=0.85)
plt.grid(axis='y', alpha=0.75)
names = ["median", "mean", "mode"]
colors = ['green', 'red', 'blue']
measurements = [median_roi_per, mean_roi_per, mode_roi_per]
for measurement, name, color in zip(measurements, names, colors):
plt.axvline(x=measurement, linestyle='--', linewidth=2.5, label='{0} at {1}'.format(name, measurement), c=color)
plt.legend(fontsize=10);
plt.title('Comparison of Mode, Median and Mean in the Distribution\n of the ROI Percentage of all the R-rated Drama movies',fontsize=14)
Text(0.5, 1.0, 'Comparison of Mode, Median and Mean in the Distribution\n of the ROI Percentage of all the R-rated Drama movies')
Visualizing The Chebyshevs Theorem of the all the ROI Percentage on all of the R-rated movies.
means = '510%'
std = '590%'
means1 = 5.1
std1 = 5.9
def make_gauss(N, sig, mu):
return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))
def main():
ax = plt.figure().add_subplot(1,1,1)
x = np.arange(-50, 50)
s = [5.9]
m = [5.1]
c = ['#ff5500']
for sig, mu, color in zip(s, m, c):
gauss = make_gauss(1, sig, mu)(x)
ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
x = np.linspace(means1 - std1*2, means1 + std1*2)
y = norm.pdf(x, means1, std1)
ax.fill_between(x, y, alpha=0.5, color='#ff5500')
ax.annotate('at least 75%\n at least 41 obs', xy=(250,0.0035), xytext=(250,0.0020),
arrowprops={'arrowstyle': '-|>'}, va='center', color='black',fontsize=11)
plt.xlim(-50, 50)
plt.ylim(0, .07)
plt.legend(fontsize=10)
plt.title('Chebyshevs Theorem on the ROI Percentage\nof the R-rated Movies in the Drama Genre (n=55)',fontsize=14)
plt.xlabel("ROI Percentage of R-rated Movies",fontsize=14)
plt.ylabel("Density", fontsize=14)
plt.show()
if __name__ == '__main__':
main()
Visualizing The Chebyshevs Theorem of the all the ROI Percentage on all of the R-rated movies.
means = '510%'
std = '590%'
means1 = 5.1
std1 = 5.9
def make_gauss(N, sig, mu):
return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))
def main():
ax = plt.figure().add_subplot(1,1,1)
x = np.arange(-40, 40)
s = [5.9]
m = [5.1]
c = ['#ff5500']
for sig, mu, color in zip(s, m, c):
gauss = make_gauss(1, sig, mu)(x)
ax.plot(x, gauss, color, linewidth=2, label=f"$\mu={means}$\n$\sigma={std}$\n")
x = np.linspace(17, 20)
y = norm.pdf(x, means1, std1)
ax.fill_between(x, y, alpha=0.5, color='#ff5500')
x1 = np.linspace(-7, -10)
y = norm.pdf(x, means1, std1)
ax.fill_between(x1, y, alpha=0.5, color='#ff5500')
ax.annotate('at least 13.9%\n at leat 8 obs',xy=(250,0.0035), xytext=(250,0.0020),
arrowprops={'arrowstyle': '-|>'}, va='center', color='black',fontsize=11)
plt.xlim(-40, 40)
plt.ylim(0, .07)
plt.legend(fontsize=10)
plt.title('Chebyshevs Theorem on the ROI \nof the R-rated Movies in the Drama Genre (n=55)',fontsize=14)
plt.xlabel("ROI of R-rated Movies",fontsize=14)
plt.ylabel("Density",fontsize=14)
plt.show()
if __name__ == '__main__':
main()
Visualizing The KDE and Jittered plot of the all the ROI Percentage on all of the R-rated movies.
import seaborn as sns
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.stripplot(data=df_roi_r, color='#ff5500');
sns.violinplot( data=df_roi_r,inner=None,color='0.8').set(title='KDE and Jittered strip plot\n on the ROI Percentage of the r-rated movies')
plt.show()
Visualizing The KDE and Swarm plot of the all the ROI Percentage on all of the R-rated movies.
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.swarmplot(data=df_roi_r, color='#ff5500');
sns.violinplot( data=df_roi_r, color='0.8', inner=None, aplha=.2).set(title='KDE and swarm plot\n on the ROI Percentage of the r-rated movies')
#sns.despine()
plt.show()
Visualizing The KDE and Rug plot of the all the ROI Percentage on all of the R-rated movies.
sns.set(font_scale=1.2)
plt.gcf().set_size_inches(5.8, 6)
sns.set_style("whitegrid")
sns.stripplot(data=df_roi_r, color='#ff5500', jitter=False)
sns.violinplot(data=df_roi_r, split=True,inner=None,
scale="count", color='0.8', alpha=.1).set(title='KDE and rug plot\n on the ROI Percentage of the r-rated movies')
#sns.despine()
plt.show()
Styling the first portion of the Frequency Distribution Table of the all the ROI Percentage on all of the R-rated movies.
freq_dis_per = freq_dis[:12].style.hide_index()\
.set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
{"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
("font-size" , "12pt")]},#headinig
{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
{'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]}, ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\1667078573.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")` freq_dis_per = freq_dis[:12].style.hide_index()\
Saving the freq_dis_per dataframe to the freq_dis_per.png file as an image to be used for the analysis later on.
dfi.export(freq_dis_per, 'freq_dis_per.png')
The 'freq_dis_per' datarame.
Styling the second portion of the Frequency Distribution Table of the all the ROI Percentage on all of the R-rated movies.
freq_dis_per1 = freq_dis[12:].style.hide_index()\
.set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
{"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
("font-size" , "12pt")]},#headinig
{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
{'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]}, ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\54156960.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")` freq_dis_per1 = freq_dis[12:].style.hide_index()\
Saving the freq_dis_per1 dataframe to the freq_dis_per1.png file as an image to be used for the analysis later on.
dfi.export(freq_dis_per1, 'freq_dis_per1.png')
The 'freq_dis_per1' datarame.
Styling the Cumulative Frequency Distribution Table of the all the ROI Percentage on all of the R-rated movies.
freq_cum_dis_per2 = freq_cum_dis2.style.hide_index()\
.set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
{"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
("font-size" , "12pt")]},#headinig
{'selector':"td", "props":[("background-color","white"), ("color"," black"),
("font-size", "10pt")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
{'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]}, ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\2639483089.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")` freq_cum_dis_per2 = freq_cum_dis2.style.hide_index()\
Saving the freq_cum_dis_per2 dataframe to the freq_cum_dis_per2.png file as an image to be used for the analysis later on.
dfi.export(freq_cum_dis_per2, 'freq_cum_dis_per2.png')
The 'freq_cum_dis_per2' datarame.
Styling the Cumelative Relative Frequency Distribution Table of the all the ROI Percentage on all of the R-rated movies.
cum_rel_freq_per2 = cum_rel_freq2.style.hide_index()\
.set_table_styles([{'selector' : '','props' : [('border','3px solid grey')]},
{"selector":"thead", 'props':[("background-color","#D3D3D3"),("color","black"),
("font-size" , "12pt")]},#headinig
{'selector':"td", "props":[("background-color","white"), ("color"," black"),
("font-size", "10pt")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','white'),('color','black')]},
{'selector':"td", "props":[('border-bottom','1px solid grey'),('border-right','1px solid grey')]}, ])
C:\Users\wisdom ovuike\AppData\Local\Temp\ipykernel_13940\572617805.py:1: FutureWarning: this method is deprecated in favour of `Styler.hide(axis="index")` cum_rel_freq_per2 = cum_rel_freq2.style.hide_index()\
Saving the cum_rel_freq_per2 dataframe to the cum_rel_freq_per2.png file as an image to be used for the analysis later on.
dfi.export(cum_rel_freq_per2, 'cum_rel_freq_per2.png')
The 'cum_rel_freq_per2' datarame.
Cumelative Relative Frequency Distribution Line Plot of the all the ROI Percentage on all of the R-rated movies.
amount = [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800]
freq = [40, 53, 69, 76, 85, 89, 92, 94, 94, 96, 96, 96, 98, 100]
plt.plot( amount, freq ,color='red', marker='o')
plt.title('Cumulative relative frequency (%) of \n the ROI Percentage made on R-rated movies', fontsize=14)
plt.xlabel('Amount of ROI', fontsize=14)
plt.ylabel('Cumulative relative frequency (%)', fontsize=14)
plt.grid(True)
plt.show()
# Normal Distribution
import numpy as np
import matplotlib.pyplot as plt
def make_gauss(N, sig, mu):
return lambda x: N/(sig * (2*np.pi)**.5) * np.e ** (-(x-mu)**2/(2 * sig**2))
def main():
ax = plt.figure().add_subplot(1,1,1)
x = np.arange(-300, 300)
s = [21,111,5.9]
m = [16,60,5.1]
c = ['b','r','g']
for sig, mu, color in zip(s, m, c):
gauss = make_gauss(1, sig, mu)(x)
ax.plot(x, gauss, color, linewidth=2)
plt.xlim(-300, 300)
plt.ylim(0, 0.07)
plt.title('Variability of the Cost, ROI, ROI Percentage of R-rated Movies\n Normal Distribution',fontsize=14)
plt.xlabel("Values of the Cost, ROI, ROI Percent of R-rated Movies",fontsize=14)
plt.ylabel("Density",fontsize=14)
plt.legend(['Cost', 'ROI', 'RIO%'], loc='best')
plt.grid(False)
plt.show()
if __name__ == '__main__':
main()
The Distribution of the Budgets of |
The Variance of the Budgets of all the |
The Variance with Standard Deviations of the |
The Arithmetic Mean of the Budgets of all the R-rated Drama movies produced in this dataset is $16,450,866 and the Standard Deviation is 20,757,148, with 55 R-rated Drama movies that were used for this analysis. The Arithmetic Mean is the sum of all observations in the given data set divided by the total, in orther words it is the finidng of the central value in a sample data set in statistics. The Standard Deviation measures the dispersion of a dataset relative to its mean, it is also used as a measurement of riskness of a deccision. The larger the Standard Deviation the more it indicates that there ia a lot of spread within the data around the mean, depending on the situation this can mean there is a high risk. The graph above is the Distribution of all the Budgets of the R-rated Drama movies, looking at the graph the distribution is shifted to the right this indicates that the mean is a large number and the Distribution is more strechted out than the Normal Distribution this indcates that the standard deviation value is large. The fact that the distribution is shifted to the right, the distribution it being strechetd out making it alot wider than a normal distributiont, the standard deviation is greater than the arithmetic mean this all indicates High Variation between the values and a abnormal distribution. This all means that the data pionts which are the budgets of all the R-rated Drama movies are spread out from the arithmetic mean. What does this imply? This means the budgeting needed to produce R-rated Drama movies are inconsistant, this means it could be extremely expensive or it could be extremely inexpensive compared to the arithmetic mean (which is 16,450,866) to produce R-rated Drama movies. However were does the majority of the dataset stand on, is it on the expensive side or the inexpensive side of the spectrum?
The Variance of the Budgets of all the R-rated Drama movies is 430,859,201,847,042, with 55 R-rated Drama movies that were used for this analysis. The Variance is a measure of dispersion that measures the spread of all data pionts in a data set. It tells you the degree of spread of the data piont from one antther. The larger the varince the more spread out the data pionts are from one another in relation to the mean. The Variance is really big as it is in the trillions, this shows that the data pionts are really far from another or it can also indicate that they may be many outliers in the dataset. This aslo indcated that beacause the variance is high and when the value of the varince is large it indicates alot of spread within the data pionts, becaue the data pionts are spread out this indicates that the budget of R-rated Drama movies is inconsistant, it could be very expensive to produce a R-rated Drama movie or is could be on the inexpensive side of the specturm. The graph above visualises what the varience will look like with the blue line as the Arithmetic Mean of 16,450,868 which is the average Budgets of all the R-rated Drama movies and the red dots as the data pionts. In this graph 17 out of 55 movies had Budgets that are bigger than the Arithmetic Mean of 16,450,868 and 38 out of 55 movies had Budgets that are smaller than the Arithmetic Mean. This tells us that even thoough the expenses to creating R-rated Drama movies is incontant, due to its high variability. The majority of the data is skewed more on the inexpensive end, there are way more movies that spent not as much money to produce their films than those who spent alot more to do so. There is a high chance that if anyone who is planning on producing R-rated Drama movies will not have to spend as much mmoney to do so.
What is a Normal Distribution? A Normal Distribution was seen as a normal distribution because early statisticians noticed the same shape of a bell curve coming up over and over again in different distributions, that is why they named it the normal distribution. It is also the most common type of distribution assumed in technical stock market analysis. What is the Empirical Rule and how is it applied to Normal Distributions? The Empirical Rule saya that almost all observed data will fall within three standard deviations from the mean or average.
The Empirical Rules is also referred to as the three-sigma rule or the 68-95-99.7 rule beacuse;The graph above is a scatter plot of the Budget of all the R-rated Drama movies from the dataset. It also has a blue line which is the Arithmetic Mean or CL (Central Line) of the data on the scatter plot, the UCL (Upper Control Limit), the LCL (Lower Control Limit) and the three standard deviations from the mean
In this scatter plot;This shows that the data is not a normal distribution. This also proves that the budgets on creating R-rated Drama movies is Skewed. Which means the majority of the costing is inexpensive. The data stops at -1 SD, so the lower bound will be -1 SD and the data stops at +3D, however the upper bound will not be +3D it will be +2D because that one data piont that is above the UCL will be considered a data piont that will be avoided at all cost because it is out of control compared to the variation of the data pionts. That data piont that is outside the UCL is a budeget of $100 million that was spent on producig a R-rated Drama Movie. It seems like $100 million will be the number that will be avoided at all cost when producing R-rated Drama Moies (due to it being out side the UCL). The lower bound is $100,000 and the ture upper bound is $61 millon, of the budgeting when producing R-rated Drama Movies. This mean the lowest money that should be spent producing R-rated Drama movies should be $100,000 and the highest budget that should be spent producing R-rated Drama Movies should be $61 million. But why?
The Distribution of the Budgets |
The Variance of the Budgets |
The Distribution of the Budgets of all the R-rated Drama movies is not a Standard Normal Distribution. The Emprical Rule or the 68-95-97 rule does not apply to this particular dataset. The Chebyshevs Theorem will be used to break down thw Distribution. The Empirical Rule does not to all data sets, only to those that are bell-shaped, and even then is stated in terms of opproximations. A method that applied yo every data set is known as the Chebyshev's Theorem. The theorem estimates the minimum proportion of obsservations that fall within a specified number of standard devaition from the mean. This theorem applies to a board range of probability distributions. Chebyshev's Theorem helps you determine where the most of the data fall within a distribution of values.
The Chebyshev's Theorem states;The Skewness of the Budgets of |
The Mean, Median and Mode in relation to the |
The Mean, Median and Mode in relation to the |
The Kernel Density Estimate on the Skewness of the |
The first graph above is a histogram of the Budgets of all the R-rated Drama movies in this data set with the Pearson's Coefficent of Skewness of the data set. The histogram is a graphed representation of the budgets organized into specified ranges. This histogram condenses the data into ranges and groupings into columns along the horizontal x-axis. The vertical y-axis represents the number count or percentage of occurences in the data for each column. The columns is the visualization of the patterns of the budgets spent on producing R-rated Drama movies. Histrograms are commonly used to demonstrate how many of a certain type of variable occur within a specific range.
Peadrson's Coefficient of Skewness is a method created by Karl Pearson to indicate if any data set is skewed using the mean and mode of the data set. There are two method, the first is subtracted the mode from the mean and dividing it by the standard deviation. The second method will be used for this analysis.
The second method of Pearson's Coefficient of Skewness is calculated by multiplying the difference between the mean and median, multiplied by three. Then divide the results by the standard deviation. If the result is a value of zero it means the distribution has no skewness at all, a positive value means the distribution is positively (right) skewed, a negative value means the distribution is negitively (left) skewed. The Pearson's Coefficient of Skewness of the Budget of all the R-rated Drama Movies is 1.07, this number is bigger than zero and it is a positive number whcich means that it is a right skew.
As you can see the shape of the histrogram is right-skewed histrogram. A right-skewed distribution is asymmetrical, because the budgets of R-rated Drama movies has a natural limit of $0, you can not spend less than $0.01 on producing or creating any movie or product. Due to the natural limit being $0 it prevents the outcome on one side (the nrgative side). due to the natural limit of $0, the ditribution peak is off center toward the limit and a tail stretches away from it making it skewed.
The secod visulization enphises on the comparison of the mode, median and mean in order to get the skewness. In the graph the mode of the budgets of R-rated Drama movies is the largest value, then the meadian of the budgets of R-rated Drama movies, then the mean of the budgets of R-rated Drama movies. The mean is $16.4 million, the median is $9 million and the mode is $2 million of all the three statistics, the mean is the largest, while the mode is the smallest. Generally if the distribution of the data is skewed to the right, the mode is oftehn less than the median, which is less than the mean. In symmetric distribution, we expect the mean and median to be equal in value. This is significant connection between the shape of the distribution and the relationship with the mean and median.
The Distribution using the Violin Plot and |
The Distribution using the Violin Plot and |
The Distribution using the Violin Plot and |
The Central Tendency using the Violin Plot and |
All five of the graphs above are all violin plot, a violin plot is a hybrid of a box plot and a kernel density plot. It is used to visualize the distribution of numerical data. Box plots only show summary statistics, violin plots depict summary statistics and the density of each variable.
The Distribution using the Violin Plot and Rug plot of all the R-rated movies in the Drama Genre in this dataset:A Rug plot is a plot of points for a single quantitative variable, displayed as dots along just the x-axis or just the y-axis. Like the other plots it is used to visualise the distribution of the datset. It is seen as a one-dimensional scatter plot. Based on the violin KDE and rug plot on the budget of the R-rated Drama movies, there is a higher probabability that memebers of the population who are producing movies will take on the given value of $8 million, as the budget used to create a R-rated movie in the Drama genre. There is a lower probability that $100 million will be spent on producing a R-rated movie in the Drama genre.
The Distribution using the Violin Plot and Swarm plot of all the R-rated Movies in the Drama Genre in this dataset:A swarm plot, also referred as the bee swarm plot, is similar to the strip plot, because they plot all the data points on the graph. The swarm plots strive to prevent onsuring points by calculating non-overlaping positions instead of plotting random overlapping jitter. This arrangement gives them the appearance of a swarm of bees, that is why they are referred as the swarm plot. Based on the KDE and swarm plot of the budget of the R-rated Drama movies, there are roughly 9 groups of 2-3 elements that share the exact value in the budgeting for producing R-rated movies in the Drama genre. About 2/3 of the groups are found between the values of $0 to $10 million. The remaining 1/3 of the groupings are found between $20 million to $40 million and $50 million to $60 million.
The Distribution using the Violin Plot and Jittered plot of all the R-rated Movies in the Drama Genre in this dataset:A Box plot also refrred as a box and whisker plot, was created to represent the spread and centers of a data set, it shows you how your data is spread out. Measures of spread incude the interquantile range and the mean of the data set . Measures of center include the mean and the median of the data set.
Reading the Box plot;
Frequency Distribution Table |
Cumulative Frequency Distribution Table |
The Bernoulli Distribution of the Budgets |
The Bernoulli Distribution of the Budgets |
The Bernoulli Distribution of the Budgets |
The Bernoulli Distribution of the Budgets |
After using the Cumulative Frequency Distribution Table to predict the likelihood of the amount of Budgeting that will be spent creating R-rated Drama Movies. The Bernoulli Distribution will beused to predict what will be expected to be spent and the probability of spending it when creating R-rated Drama Movies. Jacob Bernoulli a Swiss mathematician created the Bernoulli Distribution. The Bernoulli Distribution is a specal case of the Binomial Distribution where a single trial is conducted so that the number of observations is 1. It is a discrete probability distribution with only two possible values for the random variable. The distribution has only two possible outcomes and a single trial. The two possible outcomes in a Bernoulli Distribution are labeled by n=0 and n=1 in which n=1 means success with probability p and n=0 in which n=0 means failure occurs with probability 1-q. The probability mass function (PMF) of a Bernoulli Distribution is defined as: If a trial only has two possible outcomes, "success" and "failure" and if p is the probability of success then- Px(1) = P{X = 1} = p and if (1-p) is the probability of failure then- Px(0) = P{X = 0} = 1 - p.
The Budgeting for creating movies have 4 categories Micro-Budgeting, Low-Budgeting, Mid-Budgeting, and High-Budgeting. The budgets of the 55 R-rated Drama Movies where put into those four categories. Then the probability mass function of the Bernoulli Distribution was used to predict the probability of spending from each category. The four categories are what is expected to be spent when creating R-rated Drama Movies. The PMF of the Bernoulli Distribution will give us the probability of each category of it actually happening. The four graphs above are the Bernoulli Distribution on each of the 4 categories (micro,low,mid,high) of the Budgets of R-rated Drama Movies.
The first graph is the Bernoulli Distribution of the Budgets of R-rated Drama Movies that are Micro-Budgets. The purpose of the graph is to get the probability that if anyone was to create R-rated Drama movies, that their budget would be expected to be a micro-budget. As you can see the y-axis is the probability and the x-axis is the values of random variable X that will be used to express the probability of success and failure. The probability of failure is labeled on the x-axis as 0 and success is labeled as 1. In the Bernoulli Distribution of the Budgets of R-rated Drama movies that are micro-budgets, the probability of success (1) is 0.036 and the probability of failure (0) is 0.964. The expected value for the random variable , X, for the Bernoulli Distribution of the Budgets of R-rated Drama movies that are micro-budgets is E[X] = p which p=0.036 then E[X] = 0.036.
The second graph is the Bernoulli Distribution of the Budgets of R-rated Drama Movies that are Low-Budgets. The purpose of the graph is to get the probability that if anyone was to create R-rated Drama movies, that their budget would be expected to be a low-budget. As you can see the y-axis is the probability and the x-axis is the values of random variable X that will be used to express the probability of success and failure. The probability of failure is labeled on the x-axis as 0 and success is labeled as 1. In the Bernoulli Distribution of the Budgets of R-rated Drama movies that are low-budgets, the probability of success (1) is 0.65 and the probability of failure (0) is 0.35. The expected value for the random variable , X, for the Bernoulli Distribution of the Budgets of R-rated Drama movies that are low-budgets is E[X] = p which p=0.65 then E[X] = 0.65.
The third graph is the Bernoulli Distribution of the Budgets of R-rated Drama Movies that are Mid-Budgets. The purpose of the graph is to get the probability that if anyone was to create R-rated Drama movies, that their budget would be expected to be a mid-budget. As you can see the y-axis is the probability and the x-axis is the values of random variable X that will be used to express the probability of success and failure. The probability of failure is labeled on the x-axis as 0 and success is labeled as 1. In the Bernoulli Distribution of the Budgets of R-rated Drama movies that are mid-budgets, the probability of success (1) is 0.18 and the probability of failure (0) is 0.82. The expected value for the random variable , X, for the Bernoulli Distribution of the Budgets of R-rated Drama movies that are micro-budgets is E[X] = p which p=0.18 then E[X] = 0.18.
The fourth graph is the Bernoulli Distribution of the Budgets of R-rated Drama Movies that are High-Budgets. The purpose of the graph is to get the probability that if anyone was to create R-rated Drama movies, that their budget would be expected to be a high-budget. As you can see the y-axis is the probability and the x-axis is the values of random variable X that will be used to express the probability of success and failure. The probability of failure is labeled on the x-axis as 0 and success is labeled as 1. In the Bernoulli Distribution of the Budgets of R-rated Drama movies that are high-budgets, the probability of success (1) is 0.13 and the probability of failure (0) is 0.87. The expected value for the random variable , X, for the Bernoulli Distribution of the Budgets of R-rated Drama movies that are high-budgets is E[X] = p which p=0.13 then E[X] = 0.13.
The Bernoulli Distribution of on the Sub-groups within the Budgets of R-rated Drama Movies that are Micro-Budgets |
The Skewness of the Budgets of |
The Mean, Median and Mode in relation to the |
The Kernel Density Estimate on the Skewness of the |
Conclusion Part A: The second sub-group in the Low-Budget category with the range of $5 Million to 10 Million has the lowest probability within the Low-Budget category. The first sub-group in the Low-Budget category with the range of $1 Million to $5 Million has the highest probaaility within the Low-Budget category. Ergo organizations, businesses, studios or production companies in filmmaking that are planning on producing R-rated Drama Movies with a Low-Budget budget will be expecting to spend between $1 Million to $5 Million and there is a 57% chance of that happening.
The first and third sub-groups in the Mid-Budget category with the range of $15 Million to $20 Million and the range of $30 Million to $50 Million has the exact probability which is the lwoest probability within the Mid-Budget category. The second sub-group in the Mid-Budget category with the range of $20 Million to $30 Million has the highest probaaility within the Mid-Budget category. Ergo organizations, businesses, studios or production companies in filmmaking that are planning on producing R-rated Drama Movies with a Mid-Budget budget will be expecting to spend between $20 Million to $30 Million and there is a 40% chance of that happening.
The second and third sub-groups in the High-Budget category with the range of $60 Million to $70 Million and the range of $90 Million to $100 Million has the exact probability which is the lowest probability within the High-Budget category. The first sub-group in the High-Budget category with the range of $50 Million to $60 Million has the highest probaaility within the High-Budget category. Ergo organizations, businesses, studios or production companies in filmmaking that are planning on producing R-rated Drama Movies with a High-Budget budget will be expecting to spend between $50 Million to $60 Million and there is a 72% chance of that happening.
Conclusion Part B: The sub-group that has the lowest probabaility within the entire budgets of R-rated Drama Movies in this data set is the sub-group with the range of $60-$70 Million with the probability of 1.8% and the sub-group woth the range of $90-$100 Million with the probability of 1.8%. The two sub-groups have the same exact probability and are both from the High-Budget category. The sub-group with the highest probability within the entire budgets of R-rated Drama Movies in this data set is the sub-group with therange of $1-$5 Million with the probability of 36.3%, this sub-group is from the Low-Budget category.Ergo organizations, businesses, studios or production companies in filmmaking that are planning on producing R-rated Drama Movies with will have a budget between $1 Million to $5 Million and there is a 36.6% chance of that happening.
#<center><img src="a1.png" style="width:23%"> <img src="a2.png" style="width:23%"> <img src="a3.png" style="width:23%"></center>
#<center><h3 style='color:#fbec5d'>System Rating R:</h3></center><p>The average RIO in system R rating is 21.10. Twenty movies in this rating are above the average,
#which means 56% of the entire system R rating is above 21.10. The 3rd quantile of the ROI in this system rating is 33.40 meaning this is the most highest RIO, seven movies is above that making it 23% of the entire system R rating
#<span style='color:#fbec5d'>For every dollar spent how mich did each movie make?.</span> This statement is assuming that if they were to all have the same budget which is a dollar, how much would each movie gnerate.
#This objective is to cognize which movie is the most effcinet in each sysem rating and whcih susyem rating is the most efficient.
The average RIO in system G rating is 17.13 dollars. Twelve movies in this rating are above the average, which means 44% of the entire system G rating is above 17.13 dollars. The 3rd quantile of the RIO in this system rating is 22.05 dollars meaning this is the most highest RIO, six movies is above that making it 22% of the entire system G rating
The average RIO in system PG rating is 21.13 dollars. Tweleve movies in this rating are above the average, which means 44% of the entire system PG rating is above 21.13 dollars. The 3rd quantile of the RIO in this system rating is 31.70 dollars meaning this is the most highest RIO, seven movies is above that making it 26% of the entire system R rating
The average RIO in system PG-13 rating is 20.94 dollars. Thirsten movies in the rating are above the average, which means 48% of the entire system PG-13 rating is above 20.94 dollars. The 3rd quantile of the RIO in this system rating is 26.45 dollars meaning this is the higest RIO, nine movies is above that making it 33% of the entire system PG-13 rating
The average RIO in system NR rating is 26.90 dollars. Sixtee movies in the rating are above the average, which means 60% of the entire system NR rating is above 26.90 dollars. The 3rd quantile of the RIO in this system rating is 38.05 dollars meaning this is the hisgest RIO, eight movies is above that making it 30%of the entire system NR rating
This is the blueprint for creating the second visualiztion Gross Profit Margin Percentage, altair will be used to create this graph.
Blueprint:
The data frame that altair used to create this graph consist of four colunms;
The dataframes will consist of these six columns;
The is the 'Drama_DataFrame' dataframe. (this dataframe is interactive)
Drama_DataFrame
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | Worldwide_Gross | Worldwide_Gross_x | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Loading... (need help?) |
These are the variables needed to create the columns for the first dataframe: 'df1'
Getting all the Names of the movies that are R-rated from the 'Drama_DataFrame' dataframe.
name = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='R'and Drama_DataFrame.Profit[i] > 0:
name.append(Drama_DataFrame.Movie[i])
print(name)
['Django Unchained', 'Gone Girl', 'Priest', 'Fifty Shades Darker', 'Fifty Shades Freed', 'Crimson Peak', 'Zero Dark Thirty', 'Fifty Shades of Grey', 'The Master', 'Flight', 'The Ides of March', 'Nocturnal Animals', 'The Water Diviner', 'For Colored Girls', 'The Debt', 'Let Me In', 'Black Swan', 'Ex Machina', 'Room', 'If Beale Street Could Talk', 'Arbitrage', 'Stoker', 'Carol', 'Quartet', 'Hereditary', 'Melancholia', 'Manchester by the Sea', 'We Need to Talk About Kevin', 'Addicted', 'Mommy', 'Take Shelter', 'Boyhood', 'The Witch', 'Margin Call', 'Whiplash', 'Before Midnight', 'Silent House', "Winter's Bone", 'The Florida Project', 'We Are Your Friends', 'Locke', 'Knock Knock', 'Buried', 'Unsane', 'Blue Valentine', 'Martha Marcy May Marlene', 'Palo Alto', 'Sound of My Voice', 'A Ghost Story', 'Ordinary People', 'Fame', 'Endless Love', 'Ghost Story', 'Zoot Suit', 'Rich and Famous', 'Raggedy Man']
Getting all the Worldwide Revenue in Dollars of the movies that are R-rated from the 'Drama_DataFrame' dataframe.
world_cur= []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='R'and Drama_DataFrame.Profit[i] > 0:
world_cur.append(Drama_DataFrame.Worldwide_Gross_x[i])
print(world_cur)
['$449,948,323', '$368,567,189', '$84,154,026', '$381,398,492', '$371,350,619', '$74,966,854', '$134,612,435', '$570,998,101', '$50,647,416', '$160,558,438', '$77,735,925', '$32,398,681', '$31,054,727', '$38,017,873', '$46,604,054', '$28,270,399', '$331,266,710', '$38,358,392', '$36,262,783', '$19,859,167', '$35,830,713', '$12,034,913', '$42,843,521', '$56,178,935', '$70,133,905', '$21,817,298', '$77,733,867', '$10,765,283', '$17,499,242', '$17,536,004', '$4,972,016', '$57,273,049', '$40,454,520', '$20,433,227', '$38,969,037', '$23,251,930', '$16,610,760', '$16,131,551', '$11,295,324', '$10,153,415', '$2,088,390', '$6,328,516', '$21,270,290', '$14,244,931', '$16,566,240', '$5,438,911', '$1,156,309', '$429,448', '$2,769,782', '$54,766,923', '$77,211,836', '$34,718,173', '$1,951,683', '$3,256,082', '$13,000,000', '$11,000,000']
Getting all the Worldwide Revenue in Integer of the movies that are R-rated from the 'Drama_DataFrame' dataframe.
world_int = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='R'and Drama_DataFrame.Profit[i] > 0:
world_int.append(Drama_DataFrame.Worldwide_Gross[i])
print(world_int)
[449948323, 368567189, 84154026, 381398492, 371350619, 74966854, 134612435, 570998101, 50647416, 160558438, 77735925, 32398681, 31054727, 38017873, 46604054, 28270399, 331266710, 38358392, 36262783, 19859167, 35830713, 12034913, 42843521, 56178935, 70133905, 21817298, 77733867, 10765283, 17499242, 17536004, 4972016, 57273049, 40454520, 20433227, 38969037, 23251930, 16610760, 16131551, 11295324, 10153415, 2088390, 6328516, 21270290, 14244931, 16566240, 5438911, 1156309, 429448, 2769782, 54766923, 77211836, 34718173, 1951683, 3256082, 13000000, 11000000]
Getting all the Profit in Dollars of the movies that are R-rated from the 'Drama_DataFrame' dataframe.
profit_cur = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='R'and Drama_DataFrame.Profit[i] > 0:
profit_cur.append(Drama_DataFrame.Profit_x[i])
print(profit_cur)
['$349,948,323', '$307,567,189', '$24,154,026', '$326,398,492', '$316,350,619', '$19,966,854', '$82,112,435', '$530,998,101', '$13,147,416', '$129,558,438', '$54,735,925', '$9,898,681', '$8,554,727', '$17,017,873', '$26,604,054', '$8,270,399', '$318,266,710', '$25,358,392', '$23,262,783', '$7,859,167', '$23,830,713', '$34,913', '$31,043,521', '$45,178,935', '$60,133,905', '$12,417,298', '$69,233,867', '$3,765,283', '$12,499,242', '$12,636,004', '$222,016', '$53,273,049', '$36,954,520', '$17,033,227', '$35,669,037', '$20,251,930', '$14,610,760', '$14,131,551', '$9,295,324', '$8,153,415', '$88,390', '$4,328,516', '$19,282,640', '$12,744,931', '$15,566,240', '$4,438,911', '$156,309', '$294,448', '$2,669,782', '$48,766,923', '$68,711,836', '$14,718,173', '$1,851,683', '$556,082', '$1,500,000', '$2,000,000']
Getting all the Profit in Integer of the movies that are R-rated from the 'Drama_DataFrame' dataframe.
profit_int = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'R' and Drama_DataFrame.Profit[i] > 0:
profit_int.append(int(Drama_DataFrame.Profit[i]))
print(profit_int)
[349948323, 307567189, 24154026, 326398492, 316350619, 19966854, 82112435, 530998101, 13147416, 129558438, 54735925, 9898681, 8554727, 17017873, 26604054, 8270399, 318266710, 25358392, 23262783, 7859167, 23830713, 34913, 31043521, 45178935, 60133905, 12417298, 69233867, 3765283, 12499242, 12636004, 222016, 53273049, 36954520, 17033227, 35669037, 20251930, 14610760, 14131551, 9295324, 8153415, 88390, 4328516, 19282640, 12744931, 15566240, 4438911, 156309, 294448, 2669782, 48766923, 68711836, 14718173, 1851683, 556082, 1500000, 2000000]
Creating a list consisting of 'R' repeated 56 times for the R-rated category due to it having 56 movies for the new dataframe that will be created below.
size = list('R'*56)
print(size)
['R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R']
Getting all the Net Profit Margin of the movies that are R-rated from the 'Drama_DataFrame' dataframe.
npm = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'R' and Drama_DataFrame.Profit[i] > 0:
npm.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
print(npm)
[77, 83, 28, 85, 85, 26, 60, 92, 25, 80, 70, 30, 27, 44, 57, 29, 96, 66, 64, 39, 66, 0, 72, 80, 85, 56, 89, 34, 71, 72, 4, 93, 91, 83, 91, 87, 87, 87, 82, 80, 4, 68, 90, 89, 93, 81, 13, 68, 96, 89, 88, 42, 94, 17, 11, 18]
Converting the list consisting of Net Profit Margin of all the R-rated movies from integer to percentage.
npm_percent = []
for i in npm:
npm_percent.append("{:}%".format(i))
print(npm_percent)
['77%', '83%', '28%', '85%', '85%', '26%', '60%', '92%', '25%', '80%', '70%', '30%', '27%', '44%', '57%', '29%', '96%', '66%', '64%', '39%', '66%', '0%', '72%', '80%', '85%', '56%', '89%', '34%', '71%', '72%', '4%', '93%', '91%', '83%', '91%', '87%', '87%', '87%', '82%', '80%', '4%', '68%', '90%', '89%', '93%', '81%', '13%', '68%', '96%', '89%', '88%', '42%', '94%', '17%', '11%', '18%']
Creating a list of consisting of 'Revenue' repeated 56 times and 'Profit' repeated 56 times for the R-rated category due to it having 56 movies for the new dataframe that will be created below.
r_rate = []
for i in list(range(56)):
r_rate.append('Revenue')
for i in list(range(56)):
r_rate.append('Profit')
print(r_rate)
['Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit']
These are the variables needed to create the columns for the second dataframe: 'df2'
Getting all the Names of the movies that are PG-rated from the 'Drama_DataFrame' dataframe.
name1 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='PG'and Drama_DataFrame.Profit[i] > 0:
name1.append(Drama_DataFrame.Movie[i])
print(name1)
['Hugo', 'Dolphin Tale', 'Wonder', 'The Last Song', 'War Room', 'The Lunchbox', 'Somewhere in Time', 'Urban Cowboy', 'Cinderella', 'War Room', 'Wonder', 'Little Women', 'Overcomer', 'The Jazz Singer', 'A Walk to Remember', 'Tuck Everlasting', 'Dreamer', 'The Lake House', 'Akeelah and the Bee', 'Bridge to Terabithia', 'August Rush', 'Fireproof', 'The Last Song', "God's Not Dead", "Mr. Holland's Opus", 'Phenomenon', 'Contact', 'The Spanish Prisoner', 'Sense and Sensibility', 'The Secret of Roan Inish', 'The Remains of the Day', 'Pure Country', 'Forever Young', 'A River Runs Through It', 'Honeysuckle Rose', 'Resurrection', 'Taps', 'On Golden Pond', 'Absence of Malice', 'The Night the Lights Went Out in Georgia', 'Rocky III', 'Tex', 'Staying Alive', 'Tender Mercies', 'Footloose', 'The Natural']
Getting all the Worldwide Revenue in Dollars of the movies that are PG-rated from the 'Drama_DataFrame' dataframe.
world_cur1 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='PG'and Drama_DataFrame.Profit[i] > 0:
world_cur1.append(Drama_DataFrame.Worldwide_Gross_x[i])
print(world_cur1)
['$180,047,784', '$96,068,724', '$304,604,712', '$92,678,948', '$73,975,239', '$12,231,500', '$9,709,597', '$46,918,287', '$542,351,353', '$73,986,904', '$305,937,718', '$216,601,214', '$38,102,988', '$27,118,000', '$47,494,916', '$19,344,615', '$38,741,732', '$114,830,111', '$18,948,425', '$137,587,063', '$64,605,762', '$33,473,297', '$89,137,047', '$64,667,874', '$106,269,971', '$152,036,382', '$171,120,329', '$13,835,130', '$134,582,776', '$6,101,815', '$63,954,968', '$15,164,458', '$127,956,187', '$43,440,294', '$17,815,212', '$157,297,525', '$35,856,053', '$119,285,432', '$40,716,963', '$14,923,752', '$125,052,686', '$549,368,315', '$64,892,670', '$8,443,124', '$80,008,942', '$48,000,000']
Getting all the Worldwide Revenue in Integer of the movies that are PG-rated from the 'Drama_DataFrame' dataframe.
world_int1= []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='PG'and Drama_DataFrame.Profit[i] > 0:
world_int1.append(Drama_DataFrame.Worldwide_Gross[i])
print(world_int1)
[180047784, 96068724, 304604712, 92678948, 73975239, 12231500, 9709597, 46918287, 542351353, 73986904, 305937718, 216601214, 38102988, 27118000, 47494916, 19344615, 38741732, 114830111, 18948425, 137587063, 64605762, 33473297, 89137047, 64667874, 106269971, 152036382, 171120329, 13835130, 134582776, 6101815, 63954968, 15164458, 127956187, 43440294, 17815212, 157297525, 35856053, 119285432, 40716963, 14923752, 125052686, 549368315, 64892670, 8443124, 80008942, 48000000]
Getting all the Profit in Dollars of the movies that are PG-rated from the 'Drama_DataFrame' dataframe.
profit_cur1 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='PG'and Drama_DataFrame.Profit[i] > 0:
profit_cur1.append(Drama_DataFrame.Profit_x[i])
print(profit_cur1)
['$47,784', '$59,068,724', '$284,604,712', '$72,678,948', '$70,975,239', '$10,531,500', '$4,609,597', '$36,918,287', '$447,351,353', '$70,986,904', '$285,937,718', '$176,601,214', '$33,102,988', '$26,696,000', '$35,694,916', '$4,344,615', '$6,741,732', '$74,830,111', '$10,948,425', '$120,587,063', '$34,605,762', '$32,973,297', '$69,137,047', '$62,667,874', '$83,269,971', '$120,036,382', '$81,120,329', '$3,835,130', '$118,582,776', '$3,101,815', '$48,954,968', '$5,164,458', '$107,956,187', '$31,440,294', '$12,815,212', '$150,297,525', '$21,856,053', '$104,285,432', '$28,716,963', '$7,423,752', '$108,052,686', '$544,368,315', '$42,892,670', '$3,943,124', '$71,808,942', '$20,000,000']
Getting all the Profit in Integer of the movies that are PG-rated from the 'Drama_DataFrame' dataframe.
profit_int1 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='PG'and Drama_DataFrame.Profit[i] > 0:
profit_int1.append(int(Drama_DataFrame.Profit[i]))
print(profit_int1)
[47784, 59068724, 284604712, 72678948, 70975239, 10531500, 4609597, 36918287, 447351353, 70986904, 285937718, 176601214, 33102988, 26696000, 35694916, 4344615, 6741732, 74830111, 10948425, 120587063, 34605762, 32973297, 69137047, 62667874, 83269971, 120036382, 81120329, 3835130, 118582776, 3101815, 48954968, 5164458, 107956187, 31440294, 12815212, 150297525, 21856053, 104285432, 28716963, 7423752, 108052686, 544368315, 42892670, 3943124, 71808942, 20000000]
Creating a list consisting of 'PG' repeated 56 times for the PG-rated category due to it having 46 movies for the new dataframe that will be created below.
size_1 = []
for i in list(range(46)):
size_1.append('PG')
print(size_1)
['PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG']
Getting all the Net Profit Margin of the movies that are PG-rated from the 'Drama_DataFrame' dataframe.
npm1 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'PG'and Drama_DataFrame.Profit[i] > 0:
npm1.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
print(npm1)
[0, 61, 93, 78, 95, 86, 47, 78, 82, 95, 93, 81, 86, 98, 75, 22, 17, 65, 57, 87, 53, 98, 77, 96, 78, 78, 47, 27, 88, 50, 76, 34, 84, 72, 71, 95, 60, 87, 70, 49, 86, 99, 66, 46, 89, 41]
Converting the list consisting of Net Profit Margin of all the PG-rated movies from integer to percentage.
npm1_percent = []
for i in npm1:
npm1_percent.append("{:}%".format(i))
print(npm1_percent)
['0%', '61%', '93%', '78%', '95%', '86%', '47%', '78%', '82%', '95%', '93%', '81%', '86%', '98%', '75%', '22%', '17%', '65%', '57%', '87%', '53%', '98%', '77%', '96%', '78%', '78%', '47%', '27%', '88%', '50%', '76%', '34%', '84%', '72%', '71%', '95%', '60%', '87%', '70%', '49%', '86%', '99%', '66%', '46%', '89%', '41%']
Creating a list of consisting of 'Revenue' repeated 46 times and 'Profit' repeated 46 times for the PG-rated category due to it having 46 movies for the new dataframe that will be created below.
pg_rate = []
for i in list(range(46)):
pg_rate.append('Revenue')
for i in list(range(46)):
pg_rate.append('Profit')
print(pg_rate)
['Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit']
These are the variables needed to create the columns for the third dataframe: 'df3'
Getting all the Names of the movies that are G-rated from the 'Drama_DataFrame' dataframe.
name2 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='G' and Drama_DataFrame.Profit[i] > 0:
name2.append(Drama_DataFrame.Movie[i])
print(name2)
['A Sunday in the Country', 'Prancer', 'The Rookie', 'Beauty and the Beast 1991', 'The Little Rascals', 'Ramona and Beezus', 'The Black Stallion', 'The Hunchback of Notre Drame', 'Babe', 'Pollyanna', 'Lassie Come Home', "Charlotte's Web", 'Kit Kittredge: An American Girl', 'The Rookie', 'The Secret Garden', 'The Sound of Music', 'The Tale of Despereaux', 'The Lion King 1994', 'Bambi 1942', 'My Fair Lady 1964', "Hachiko: A Dog's Story", 'Giant', 'The Ten Commandments 1966', 'The Quiet Man', 'Three Cions in the Fountain']
Getting all the Worldwide Revenue in Dollars of the movies that are G-rated from the 'Drama_DataFrame' dataframe.
world_cur2 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='G'and Drama_DataFrame.Profit[i] > 0:
world_cur2.append(Drama_DataFrame.Worldwide_Gross_x[i])
print(world_cur2)
['$2,411,143', '$18,587,135', '$80,693,537', '$438,656,843', '$66,947,950', '$27,469,621', '$37,799,643', '$325,500,000', '$246,100,000', '$3,750,000', '$4,517,000', '$143,985,708', '$17,657,973', '$80,491,516', '$311,281,000', '$286,214,195', '$90,482,317', '$986,214,868', '$268,000,000', '$72,071,636', '$47,707,417', '$30,194,409', '$65,500,000', '$7,600,377', '$12,000,000']
Getting all the Worldwide Revenue in Integer of the movies that are G-rated from the 'Drama_DataFrame' dataframe.
world_int2= []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='G' and Drama_DataFrame.Profit[i] > 0:
world_int2.append(Drama_DataFrame.Worldwide_Gross[i])
print(world_int2)
[2411143, 18587135, 80693537, 438656843, 66947950, 27469621, 37799643, 325500000, 246100000, 3750000, 4517000, 143985708, 17657973, 80491516, 311281000, 286214195, 90482317, 986214868, 268000000, 72071636, 47707417, 30194409, 65500000, 7600377, 12000000]
Getting all the Profit in Dollars of the movies that are G-rated from the 'Drama_DataFrame' dataframe.
profit_cur2 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='G' and Drama_DataFrame.Profit[i] > 0:
profit_cur2.append(Drama_DataFrame.Profit_x[i])
print(profit_cur2)
['$1,711,143', '$11,587,135', '$58,693,537', '$418,656,843', '$43,947,950', '$12,469,621', '$35,099,643', '$255,500,000', '$216,100,000', '$1,250,000', '$3,851,000', '$58,985,708', '$7,657,973', '$58,491,516', '$293,281,000', '$278,014,195', '$30,482,317', '$941,214,868', '$267,142,000', '$55,071,636', '$37,707,417', '$23,794,409', '$52,500,000', '$5,850,377', '$10,300,000']
Getting all the Profit in Integer of the movies that are G-rated from the 'Drama_DataFrame' dataframe.
profit_int2 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='G' and Drama_DataFrame.Profit[i] > 0:
profit_int2.append(int(Drama_DataFrame.Profit[i]))
print(profit_int2)
[1711143, 11587135, 58693537, 418656843, 43947950, 12469621, 35099643, 255500000, 216100000, 1250000, 3851000, 58985708, 7657973, 58491516, 293281000, 278014195, 30482317, 941214868, 267142000, 55071636, 37707417, 23794409, 52500000, 5850377, 10300000]
Creating a list consisting of 'G' repeated 25 times for the G-rated category due to it having 25 movies for the new dataframe that will be created below.
size_2 = list('G'*25);print(size_2)
['G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G']
Getting all the Net Profit Margin of the movies that are G-rated from the 'Drama_DataFrame' dataframe.
npm2 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'G' and Drama_DataFrame.Profit[i] > 0:
npm2.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
print(npm2)
[70, 62, 72, 95, 65, 45, 92, 78, 87, 33, 85, 40, 43, 72, 94, 97, 33, 95, 99, 76, 79, 78, 80, 76, 85]
Converting the list consisting of Net Profit Margin of all the G-rated movies from integer to percentage.
npm2_percent = []
for i in npm2:
npm2_percent.append("{:}%".format(i))
print(npm2_percent)
['70%', '62%', '72%', '95%', '65%', '45%', '92%', '78%', '87%', '33%', '85%', '40%', '43%', '72%', '94%', '97%', '33%', '95%', '99%', '76%', '79%', '78%', '80%', '76%', '85%']
Creating a list of consisting of 'Revenue' repeated 25 times and 'Profit' repeated 25 times for the G-rated category due to it having 25 movies for the new dataframe that will be created below.
g_rate = []
for i in list(range(25)):
g_rate.append('Revenue')
for i in list(range(25)):
g_rate.append('Profit')
print(g_rate)
['Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit']
These are the variables needed to create the columns for the fourth dataframe: 'df4'
Getting all the Names of the movies that are PG-13 rated from the 'Drama_DataFrame' dataframe.
name3 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='PG-13' and Drama_DataFrame.Profit[i] > 0:
name3.append(Drama_DataFrame.Movie[i])
print(name3)
['Gravity', 'Sing', 'Contagion', 'Burlesque', 'Creed II', 'The Post', 'Hereafter', 'Anna Karenina', 'Arrival', 'Charlie St. Cloud', 'Bridge of Spies', 'The Impossible', 'Water for Elephants', 'Creed', 'The Rite', 'Collateral Beauty', 'True Grit', 'The Tree of Life', 'The Longest Ride', 'Step Up Revolution', 'The Vow', 'The Age of Adaline', 'Safe Haven', 'The Best of Me', 'The Help', 'Dear John', 'The Lucky One', 'The Giver', 'Draft Day', 'Rings', 'Fences', 'Me Before You', 'The Light Between Oceans', 'The Book Thief', 'A Quiet Place', 'Beastly', 'The Roommate', 'Remember Me', 'The Woman in Black', 'Country Strong', 'One Day', 'Suffragette', 'The Perks of Being a Wallflower', 'Project Almanac', 'Wish Upon', 'If I Stay', 'Brooklyn', 'Everything, Everything', 'Mud', 'Amour', 'Ouija: Origin of Evil', 'Black or White', 'The Bye Bye Man', 'Gifted', 'The Words', 'Lights Out', 'Still Alice', 'Before I Fall', 'Rabbit Hole', 'Ida', 'Courageous', 'Mustang', 'Like Crazy', 'Another Earth']
Getting all the Worldwide Revenue in Dollars of the movies that are PG-13 rated from the 'Drama_DataFrame' dataframe.
world_cur3 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='PG-13' and Drama_DataFrame.Profit[i] > 0:
world_cur3.append(Drama_DataFrame.Worldwide_Gross_x[i])
print(world_cur3)
['$693,698,673', '$634,454,789', '$137,551,594', '$90,552,675', '$213,591,522', '$179,748,880', '$108,660,270', '$71,004,627', '$203,127,894', '$48,478,084', '$162,498,338', '$169,590,606', '$116,809,717', '$173,567,581', '$97,143,987', '$85,309,093', '$252,276,928', '$61,721,826', '$63,802,928', '$165,552,290', '$197,618,160', '$68,984,536', '$94,050,951', '$41,059,418', '$213,120,004', '$142,033,509', '$96,633,833', '$66,540,205', '$29,847,480', '$82,917,283', '$64,282,881', '$208,265,198', '$22,281,732', '$76,086,711', '$334,522,294', '$38,028,230', '$52,545,707', '$56,506,120', '$128,955,898', '$20,601,987', '$59,168,692', '$34,044,909', '$33,069,303', '$32,909,437', '$23,477,345', '$78,356,170', '$62,076,141', '$61,603,136', '$31,556,959', '$36,787,044', '$81,831,866', '$21,971,021', '$31,187,727', '$36,964,656', '$16,369,708', '$148,806,510', '$41,699,612', '$18,945,682', '$6,205,034', '$15,298,355', '$35,185,884', '$5,552,584', '$3,728,400', '$2,102,779']
Getting all the Worldwide Revenue in Integer of the movies that are PG-13 rated from the 'Drama_DataFrame' dataframe.
world_int3= []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='PG-13' and Drama_DataFrame.Profit[i] > 0:
world_int3.append(Drama_DataFrame.Worldwide_Gross[i])
print(world_int3)
[693698673, 634454789, 137551594, 90552675, 213591522, 179748880, 108660270, 71004627, 203127894, 48478084, 162498338, 169590606, 116809717, 173567581, 97143987, 85309093, 252276928, 61721826, 63802928, 165552290, 197618160, 68984536, 94050951, 41059418, 213120004, 142033509, 96633833, 66540205, 29847480, 82917283, 64282881, 208265198, 22281732, 76086711, 334522294, 38028230, 52545707, 56506120, 128955898, 20601987, 59168692, 34044909, 33069303, 32909437, 23477345, 78356170, 62076141, 61603136, 31556959, 36787044, 81831866, 21971021, 31187727, 36964656, 16369708, 148806510, 41699612, 18945682, 6205034, 15298355, 35185884, 5552584, 3728400, 2102779]
Getting all the Profit in Dollars of the movies that are PG-13 rated from the 'Drama_DataFrame' dataframe.
profit_cur3 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='PG-13' and Drama_DataFrame.Profit[i] > 0:
profit_cur3.append(Drama_DataFrame.Profit_x[i])
print(profit_cur3)
['$583,698,673', '$559,454,789', '$77,551,594', '$35,552,675', '$163,591,522', '$129,748,880', '$58,660,270', '$22,004,627', '$156,127,894', '$4,478,084', '$122,498,338', '$129,590,606', '$78,809,717', '$136,567,581', '$60,143,987', '$49,309,093', '$217,276,928', '$26,721,826', '$29,802,928', '$132,552,290', '$167,618,160', '$38,984,536', '$66,050,951', '$15,059,418', '$188,120,004', '$117,033,509', '$71,633,833', '$41,540,205', '$4,847,480', '$57,917,283', '$40,282,881', '$188,265,198', '$2,281,732', '$57,086,711', '$317,522,294', '$21,028,230', '$36,545,707', '$40,506,120', '$113,955,898', '$5,601,987', '$44,168,692', '$20,044,909', '$20,069,303', '$20,909,437', '$11,477,345', '$67,356,170', '$51,076,141', '$51,603,136', '$21,556,959', '$27,087,044', '$72,831,866', '$12,971,021', '$23,787,727', '$29,964,656', '$10,369,708', '$143,806,510', '$36,699,612', '$13,945,682', '$1,205,034', '$12,698,355', '$33,185,884', '$4,152,584', '$3,478,400', '$1,927,779']
Getting all the Profit in Integer of the movies that are PG-13 rated from the 'Drama_DataFrame' dataframe.
profit_int3 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='PG-13' and Drama_DataFrame.Profit[i] > 0:
profit_int3.append(int(Drama_DataFrame.Profit[i]))
print(profit_int3)
[583698673, 559454789, 77551594, 35552675, 163591522, 129748880, 58660270, 22004627, 156127894, 4478084, 122498338, 129590606, 78809717, 136567581, 60143987, 49309093, 217276928, 26721826, 29802928, 132552290, 167618160, 38984536, 66050951, 15059418, 188120004, 117033509, 71633833, 41540205, 4847480, 57917283, 40282881, 188265198, 2281732, 57086711, 317522294, 21028230, 36545707, 40506120, 113955898, 5601987, 44168692, 20044909, 20069303, 20909437, 11477345, 67356170, 51076141, 51603136, 21556959, 27087044, 72831866, 12971021, 23787727, 29964656, 10369708, 143806510, 36699612, 13945682, 1205034, 12698355, 33185884, 4152584, 3478400, 1927779]
Getting all the Net Profit Margin of the movies that are PG-13 rated from the 'Drama_DataFrame' dataframe.
npm3 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='PG-13' and Drama_DataFrame.Profit[i] > 0:
npm3.append(int(Drama_DataFrame.Profit[i]/Drama_DataFrame.Worldwide_Gross[i]*100))
print(npm3)
[84, 88, 56, 39, 76, 72, 53, 30, 76, 9, 75, 76, 67, 78, 61, 57, 86, 43, 46, 80, 84, 56, 70, 36, 88, 82, 74, 62, 16, 69, 62, 90, 10, 75, 94, 55, 69, 71, 88, 27, 74, 58, 60, 63, 48, 85, 82, 83, 68, 73, 89, 59, 76, 81, 63, 96, 88, 73, 19, 83, 94, 74, 93, 91]
Creating a list consisting of 'PG-13' repeated 64 times for the PG-13 rated category due to it having 64 movies for the new dataframe that will be created below.
size_3 = []
for i in list(range(64)):
size_3.append('PG-13')
print(size_3)
['PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13']
Converting the list consisting of Net Profit Margin of all the PG-13 rated movies from integer to percentage.
npm3_percent = []
for i in npm3:
npm3_percent.append("{:}%".format(i))
print(npm3_percent)
['84%', '88%', '56%', '39%', '76%', '72%', '53%', '30%', '76%', '9%', '75%', '76%', '67%', '78%', '61%', '57%', '86%', '43%', '46%', '80%', '84%', '56%', '70%', '36%', '88%', '82%', '74%', '62%', '16%', '69%', '62%', '90%', '10%', '75%', '94%', '55%', '69%', '71%', '88%', '27%', '74%', '58%', '60%', '63%', '48%', '85%', '82%', '83%', '68%', '73%', '89%', '59%', '76%', '81%', '63%', '96%', '88%', '73%', '19%', '83%', '94%', '74%', '93%', '91%']
Creating a list of consisting of 'Revenue' repeated 64 times and 'Profit' repeated 64 times for the PG-13 rated category due to it having 64 movies for the new dataframe that will be created below.
pg13_rate = []
for i in list(range(64)):
pg13_rate.append('Revenue')
for i in list(range(64)):
pg13_rate.append('Profit')
print(pg13_rate)
['Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit']
These are the variables needed to create the columns for the fivith dataframe: 'df5'
Getting all the Names of the movies that are NC-17 rated from the 'Drama_DataFrame' dataframe.
name4= []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='NC-17'and Drama_DataFrame.Profit[i] > 0:
name4.append(Drama_DataFrame.Movie[i])
print(name4)
['Shame', 'Matador', 'Whore', 'Tokyo Decadence', 'Wide Sargasso Sea', 'Kids', 'Crash', 'The Dreamers', 'Lust, Caution', 'Shame', 'Blue Is the Warmest Colour', 'The Dreamers', 'Shame', 'Blue Is the Warmest Colour', 'Blue Valentine', 'Two Girls and a Guy', 'Elles', 'Hell', 'Se, jie', 'The Evil Dead', 'Shame', 'Arabian Nights', 'Natural Born Killers', 'Clerks', 'Bad Lieutenant', 'Beyond the Valley of the Dolls', 'Kids', 'Crash', 'Last Tango in Paris', 'Pink Flamingos', 'Lust, Caution ', 'Happiness 1998', 'Whore 1991', 'Law of Desire']
Getting all the Worldwide Revenue in Dollars of the movies that are NC-17 rated from the 'Drama_DataFrame' dataframe.
world_cur4 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='NC-17'and Drama_DataFrame.Profit[i] > 0:
world_cur4.append(Drama_DataFrame.Worldwide_Gross_x[i])
print(world_cur4)
['$20,412,841', '$17,356,268', '$1,008,404', '$277,845', '$1,614,784', '$20,412,216', '$98,410,061', '$15,121,165', '$67,091,915', '$20,412,841', '$19,465,835', '$15,307,113', '$20,412,841', '$19,465,835', '$16,566,240', '$2,315,026', '$3,822,241', '$213,120,004', '$65,167,430', '$2,661,944', '$20,412,841', '$3,453,416', '$50,283,563', '$3,894,240', '$2,038,916', '$9,000,000', '$20,412,216', '$101,173,038', '$36,147,711', '$413,802', '$65,167,430', '$5,746,453', '$1,008,404', '$1,470,809']
Getting all the Worldwide Revenue in Integer of the movies that are NC-17 rated from the 'Drama_DataFrame' dataframe.
world_int4 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='NC-17'and Drama_DataFrame.Profit[i] > 0:
world_int4.append(Drama_DataFrame.Worldwide_Gross[i])
print(world_int4)
[20412841, 17356268, 1008404, 277845, 1614784, 20412216, 98410061, 15121165, 67091915, 20412841, 19465835, 15307113, 20412841, 19465835, 16566240, 2315026, 3822241, 213120004, 65167430, 2661944, 20412841, 3453416, 50283563, 3894240, 2038916, 9000000, 20412216, 101173038, 36147711, 413802, 65167430, 5746453, 1008404, 1470809]
Getting all the Profit in Dollars of the movies that are NC-17 rated from the 'Drama_DataFrame' dataframe.
profit_cur4 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='NC-17'and Drama_DataFrame.Profit[i] > 0:
profit_cur4.append(Drama_DataFrame.Profit_x[i])
print(profit_cur4)
['$13,912,841', '$4,856,268', '$8,404', '$257,845', '$659,312', '$18,912,216', '$89,410,061', '$121,165', '$52,091,915', '$13,912,841', '$15,465,835', '$307,113', '$13,912,841', '$15,390,895', '$15,566,240', '$1,315,026', '$256,669', '$201,120,004', '$50,167,430', '$2,311,944', '$13,912,841', '$2,548,651', '$16,283,563', '$3,664,240', '$1,038,916', '$8,000,000', '$18,912,216', '$94,673,038', '$34,897,711', '$401,802', '$50,167,430', '$3,546,453', '$958,404', '$858,737']
Getting all the Profit in Integer of the movies that are NC-17 rated from the 'Drama_DataFrame' dataframe.
profit_int4 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='NC-17'and Drama_DataFrame.Profit[i] > 0:
profit_int4.append(int(Drama_DataFrame.Profit[i]))
print(profit_int4)
[13912841, 4856268, 8404, 257845, 659312, 18912216, 89410061, 121165, 52091915, 13912841, 15465835, 307113, 13912841, 15390895, 15566240, 1315026, 256669, 201120004, 50167430, 2311944, 13912841, 2548651, 16283563, 3664240, 1038916, 8000000, 18912216, 94673038, 34897711, 401802, 50167430, 3546453, 958404, 858737]
Getting all the Net Profit Margin of the movies that are NC-17 rated from the 'Drama_DataFrame' dataframe.
npm4 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x =='NC-17'and Drama_DataFrame.Profit[i] > 0:
npm4.append(int(Drama_DataFrame.Profit[i]/Drama_DataFrame.Worldwide_Gross[i]*100))
print(npm4)
[68, 27, 0, 92, 40, 92, 90, 0, 77, 68, 79, 2, 68, 79, 93, 56, 6, 94, 76, 86, 68, 73, 32, 94, 50, 88, 92, 93, 96, 97, 76, 61, 95, 58]
Creating a list consisting of 'NC-17' repeated 34 times for the NC-17 rated category due to it having 34 movies for the new dataframe that will be created below.
size_4 = []
for i in list(range(34)):
size_4.append('NC-17')
print(size_4)
['NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17']
Converting the list consisting of Net Profit Margin of all the NC-17 rated movies from integer to percentage.
npm4_percent = []
for i in npm4:
npm4_percent.append("{:}%".format(i))
print(npm4_percent)
['68%', '27%', '0%', '92%', '40%', '92%', '90%', '0%', '77%', '68%', '79%', '2%', '68%', '79%', '93%', '56%', '6%', '94%', '76%', '86%', '68%', '73%', '32%', '94%', '50%', '88%', '92%', '93%', '96%', '97%', '76%', '61%', '95%', '58%']
Creating a list of consisting of 'Revenue' repeated 34 times and 'Profit' repeated 34 times for the NC-17 rated category due to it having 34 movies for the new dataframe that will be created below.
nc17_rate = []
for i in list(range(34)):
nc17_rate.append('Revenue')
for i in list(range(34)):
nc17_rate.append('Profit')
print(nc17_rate)
['Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Revenue', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit', 'Profit']
Creating dataframes df1, df2, df3 and df4.
df5 = pd.DataFrame({'Name':name4,'Revenue':world_cur4, "Profit":profit_cur4,
'Int Revenue':world_int4, "Int Profit":profit_int4,'System Ratings':size_4,
'Net Profit Margin %':npm4_percent})
df4 = pd.DataFrame({'Name':name3,'Revenue':world_cur3, "Profit":profit_cur3,
'Int Revenue':world_int3, "Int Profit":profit_int3,'System Ratings':size_3,
'Net Profit Margin %':npm3_percent})
df3 = pd.DataFrame({'Name':name2,'Revenue':world_cur2, "Profit":profit_cur2,
'Int Revenue':world_int2, "Int Profit":profit_int2,'System Ratings':size_2,
'Net Profit Margin %':npm2_percent})
df2 = pd.DataFrame({'Name':name1,'Revenue':world_cur1, "Profit":profit_cur1,
'Int Revenue':world_int1, "Int Profit":profit_int1,'System Ratings':size_1,
'Net Profit Margin %':npm1_percent})
df1 = pd.DataFrame({'Name':name,'Revenue':world_cur, "Profit":profit_cur,
'Int Revenue':world_int, "Int Profit":profit_int,'System Ratings':size,
'Net Profit Margin %':npm_percent})
The 'df1' dataframe. (this dataframe is interactive)
df1
| Name | Revenue | Profit | Int Revenue | Int Profit | System Ratings | Net Profit Margin % |
|---|---|---|---|---|---|---|
| Loading... (need help?) |
Sorting the npm_sort list which consist of the list of the R-rated movies Net Profit Margin in accending order
npm_sort = []
for i in npm:npm_sort.append(i)
npm_sort.sort();print(npm_sort)
[0, 4, 4, 11, 13, 17, 18, 25, 26, 27, 28, 29, 30, 34, 39, 42, 44, 56, 57, 60, 64, 66, 66, 68, 68, 70, 71, 72, 72, 77, 80, 80, 80, 81, 82, 83, 83, 85, 85, 85, 87, 87, 87, 88, 89, 89, 89, 90, 91, 91, 92, 93, 93, 94, 96, 96]
Getting the index of the sorted npm_sort list.
index_npm = []
for x,i in enumerate(npm_sort):
if i in npm:index_npm.append(npm.index(i))
print(index_npm)
[21, 30, 30, 54, 46, 53, 55, 8, 5, 12, 2, 15, 11, 27, 19, 51, 13, 25, 14, 6, 18, 17, 17, 41, 41, 10, 28, 22, 22, 0, 9, 9, 9, 45, 38, 1, 1, 3, 3, 3, 35, 35, 35, 50, 26, 26, 26, 42, 32, 32, 7, 31, 31, 52, 16, 16]
Re-arranging the dataframe df1 according to the Net Profit Margin of the movies going in accending order and making a new dataframe df_r1.
df_r1 = df1.iloc[index_npm]
Re-setting the index of the df_r1 datafrae.
df_r1 = df_r1.reset_index()
Deleting unwanted columns from the newly created dataframe df_r1 .
del df_r1['index']
del df_r1['Int Revenue']
del df_r1['Int Profit']
del df_r1['System Ratings']
Deleting rows that are duplicated and resetting the index in the dataframe df_r1 .
df_r1 = df_r1.drop([2, 22, 24, 30, 36, 37, 40, 38, 41, 44, 45, 48, 51, 54, 28, 31])
df_r1 = df_r1.reset_index()
Deleting the colunm index due to it showing up when resetting the index in a datframe.
del df_r1['index']
The first ten rows in the df_r1 dataframe.
df_r1.head(10)
| Name | Revenue | Profit | Net Profit Margin % | |
|---|---|---|---|---|
| 0 | Stoker | $12,034,913 | $34,913 | 0% |
| 1 | Take Shelter | $4,972,016 | $222,016 | 4% |
| 2 | Take Shelter | $4,972,016 | $222,016 | 4% |
| 3 | Rich and Famous | $13,000,000 | $1,500,000 | 11% |
| 4 | Palo Alto | $1,156,309 | $156,309 | 13% |
| 5 | Zoot Suit | $3,256,082 | $556,082 | 17% |
| 6 | Raggedy Man | $11,000,000 | $2,000,000 | 18% |
| 7 | The Master | $50,647,416 | $13,147,416 | 25% |
| 8 | Crimson Peak | $74,966,854 | $19,966,854 | 26% |
| 9 | The Water Diviner | $31,054,727 | $8,554,727 | 27% |
Turning the Net Profit Margin from integer to string whith a percentage sybmol, then putting them into five categories going from 0-20, 20-40, 40-60, 60-80 and 80-100.
cat_npm = []
for i in npm:
if 0 <= i < 20: cat_npm.append('0% - 20%')
if 20 <= i < 40: cat_npm.append('20% - 40%')
if 40 <= i < 60: cat_npm.append('40% - 60%')
if 60 <= i < 80: cat_npm.append('60% - 80%')
if 80 <= i < 100: cat_npm.append('80% - 100%')
print(cat_npm)
['60% - 80%', '80% - 100%', '20% - 40%', '80% - 100%', '80% - 100%', '20% - 40%', '60% - 80%', '80% - 100%', '20% - 40%', '80% - 100%', '60% - 80%', '20% - 40%', '20% - 40%', '40% - 60%', '40% - 60%', '20% - 40%', '80% - 100%', '60% - 80%', '60% - 80%', '20% - 40%', '60% - 80%', '0% - 20%', '60% - 80%', '80% - 100%', '80% - 100%', '40% - 60%', '80% - 100%', '20% - 40%', '60% - 80%', '60% - 80%', '0% - 20%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '0% - 20%', '60% - 80%', '80% - 100%', '80% - 100%', '80% - 100%', '80% - 100%', '0% - 20%', '60% - 80%', '80% - 100%', '80% - 100%', '80% - 100%', '40% - 60%', '80% - 100%', '0% - 20%', '0% - 20%', '0% - 20%']
Using Counter to see how many is in each category. There are 7 R-rated movies that have a Net Profit Margin between '0% - 20%'. There are 8 R-rated movies that have a Net Profit Margin between '20% - 40%'. There are 4 R-rated movies that have a Net Profit Margin between '40% - 60%'. There are 11 R-rated movies that have a Net Profit Margin between '60% - 80%'. There are 26 R-rated movies that have a Net Profit Margin between '80% - 100%'.
Counter(cat_npm)
Counter({'60% - 80%': 11,
'80% - 100%': 26,
'20% - 40%': 8,
'40% - 60%': 4,
'0% - 20%': 7})
Styling df_r1 dataframe using the a function and the indexes to do so.
def highlight_cells13(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[0,:] = 'background-color:#EFB8B8;color:black;border-bottom: 2px solid black'
df.iloc[1,:] = 'background-color:#EFB8B8;color:black;border-bottom: 2px solid black'
df.iloc[2,:] = 'background-color:#EFB8B8;color:black;border-bottom: 2px solid black'
df.iloc[3,:] = 'background-color:#EFB8B8;color:black;border-bottom: 2px solid black'
df.iloc[4,:] = 'background-color:#EFB8B8;color:black;border-bottom: 2px solid black'
df.iloc[5,:] = 'background-color:#EFB8B8;color:black;border-bottom: 2px solid black'
df.iloc[6,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
df.iloc[7,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
df.iloc[8,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
df.iloc[9,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
df.iloc[10,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
df.iloc[11,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
df.iloc[12,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
df.iloc[13,:] = 'background-color:#E66A6A;color:black;border-bottom: 2px solid black'
df.iloc[14,:] = 'background-color:#FF0000;color:white;border-bottom: 2px solid black'
df.iloc[15,:] = 'background-color:#FF0000;color:white;border-bottom: 2px solid black'
df.iloc[16,:] = 'background-color:#FF0000;color:white;border-bottom: 2px solid black'
df.iloc[17,:] = 'background-color:#FF0000;color:white;border-bottom: 2px solid black'
df.iloc[18,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
df.iloc[19,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
#df.iloc[6,:] = 'selector:th.row_heading;border-bottom: 3px solid red '#FF0000','#C20404' '
return df
df_r2 = df_r1[:20].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
{"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
])\
.apply(highlight_cells13, axis=None)
Saving the df_r2 dataframe to the df_r2.png file as an image to be used for the analysis later on.
dfi.export(df_r2, 'df_r2.png')
The df_r2 dataframe.
Styling df_r3 dataframe using the a function and the indexes to do so.
def highlight_cells13(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[0,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
df.iloc[1,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
df.iloc[2,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
df.iloc[3,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
df.iloc[4,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
df.iloc[5,:] = 'background-color:#C20404;color:white;border-bottom: 2px solid black'
df.iloc[6,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
df.iloc[7,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
df.iloc[8,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
df.iloc[9,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
df.iloc[10,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
df.iloc[11,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
df.iloc[12,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
df.iloc[13,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
df.iloc[14,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
df.iloc[15,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
df.iloc[16,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
df.iloc[17,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
df.iloc[18,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
df.iloc[19,:] = 'background-color:#690000;color:white;border-bottom: 2px solid black'
#df.iloc[6,:] = 'selector:th.row_heading;border-bottom: 3px solid red '#FF0000','#C20404' '
return df
df_r3 = df_r1[20:].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
{"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
])\
.apply(highlight_cells13, axis=None)
Saving the df_r3 dataframe to the df_r3.png file as an image to be used for the analysis later on.
dfi.export(df_r3, 'df_r3.png')
The df_r3 dataframe.
The 'df2' dataframe. (this dataframe is interactive)
df2
| Name | Revenue | Profit | Int Revenue | Int Profit | System Ratings | Net Profit Margin % |
|---|---|---|---|---|---|---|
| Loading... (need help?) |
Sorting the npm_sort1 list which consist of the list of the PG-rated movies Net Profit Margin in accending order
npm_sort1 = []
for i in npm1:npm_sort1.append(i)
npm_sort1.sort();print(npm_sort1)
[0, 17, 22, 27, 34, 41, 46, 47, 47, 49, 50, 53, 57, 60, 61, 65, 66, 70, 71, 72, 75, 76, 77, 78, 78, 78, 78, 81, 82, 84, 86, 86, 86, 87, 87, 88, 89, 93, 93, 95, 95, 95, 96, 98, 98, 99]
Getting the index of the sorted npm_sort1 list.
index_npm1 = []
for x,i in enumerate(npm_sort1):
if i in npm1:index_npm1.append(npm1.index(i))
print(index_npm1)
[0, 16, 15, 27, 31, 45, 43, 6, 6, 39, 29, 20, 18, 36, 1, 17, 42, 38, 34, 33, 14, 30, 22, 3, 3, 3, 3, 11, 8, 32, 5, 5, 5, 19, 19, 28, 44, 2, 2, 4, 4, 4, 23, 13, 13, 41]
Re-arranging the dataframe df2 according to the Net Profit Margin of the movies going in accending order and making a new dataframe df_pg.
df_pg = df2.iloc[index_npm1]
Re-setting the index of the df_pg datafrae.
df_pg = df_pg.reset_index()
Deleting unwanted columns from the newly created dataframe df_pg .
del df_pg['index']
del df_pg['Int Revenue']
del df_pg['Int Profit']
del df_pg['System Ratings']
Deleting rows that are duplicated and resetting the index in the dataframe df_pg .
df_pg = df_pg.drop([7, 22, 23, 24, 25, 30, 31, 33, 37, 39, 40, 43, 45])
df_pg = df_pg.reset_index()
Deleting the colunm index due to it showing up when resetting the index in a datframe.
del df_pg['index']
Changing the name of a movie and making it shorter to fit the datframe.
df_pg.Name[8] = 'The Night the Lights...'#The Night the Lights \n Went Out in Georgia'
The first ten rows in the df_pg dataframe.
df_pg.head(10)
| Name | Revenue | Profit | Net Profit Margin % | |
|---|---|---|---|---|
| 0 | Hugo | $180,047,784 | $47,784 | 0% |
| 1 | Dreamer | $38,741,732 | $6,741,732 | 17% |
| 2 | Tuck Everlasting | $19,344,615 | $4,344,615 | 22% |
| 3 | The Spanish Prisoner | $13,835,130 | $3,835,130 | 27% |
| 4 | Pure Country | $15,164,458 | $5,164,458 | 34% |
| 5 | The Natural | $48,000,000 | $20,000,000 | 41% |
| 6 | Tender Mercies | $8,443,124 | $3,943,124 | 46% |
| 7 | Somewhere in Time | $9,709,597 | $4,609,597 | 47% |
| 8 | The Night the Lights... | $14,923,752 | $7,423,752 | 49% |
| 9 | The Secret of Roan Inish | $6,101,815 | $3,101,815 | 50% |
Turning the Net Profit Margin from integer to string whith a percentage sybmol, then putting them into five categories going from 0-20, 20-40, 40-60, 60-80 and 80-100.
cat_npm1 = []
for i in npm1:
if 0 <= i < 20: cat_npm1.append('0% - 20%')
if 20 <= i < 40: cat_npm1.append('20% - 40%')
if 40 <= i < 60: cat_npm1.append('40% - 60%')
if 60 <= i < 80: cat_npm1.append('60% - 80%')
if 80 <= i < 100: cat_npm1.append('80% - 100%')
Using Counter to see how many is in each category. There are 2 PG-rated movies that have a Net Profit Margin between '0% - 20%'. There are 3 PG-rated movies that have a Net Profit Margin between '20% - 40%'. There are 8 PG-rated movies that have a Net Profit Margin between '40% - 60%'. There are 14 PG-rated movies that have a Net Profit Margin between '60% - 80%'. There are 19 PG-rated movies that have a Net Profit Margin between '80% - 100%'.
Counter(cat_npm1)
Counter({'0% - 20%': 2,
'60% - 80%': 14,
'80% - 100%': 19,
'40% - 60%': 8,
'20% - 40%': 3})
Styling df_pg1 dataframe using the a function and the indexes to do so.
def highlight_cells13(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[0,:] = 'background-color:#F8C9B4;color:black;border-bottom: 2px solid black'
df.iloc[1,:] = 'background-color:#F8C9B4;color:black;border-bottom: 2px solid black'
df.iloc[2,:] = 'background-color:#F5966B;color:black;border-bottom: 2px solid black'
df.iloc[3,:] = 'background-color:#F5966B;color:black;border-bottom: 2px solid black'
df.iloc[4,:] = 'background-color:#F5966B;color:black;border-bottom: 2px solid black'
df.iloc[5,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
df.iloc[6,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
df.iloc[7,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
df.iloc[8,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
df.iloc[9,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
df.iloc[10,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
df.iloc[11,:] = 'background-color:#FF5000;color:black;border-bottom: 2px solid black'
df.iloc[12,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
df.iloc[13,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
df.iloc[14,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
df.iloc[15,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
#df.iloc[6,:] = 'selector:th.row_heading;border-bottom: 3px solid red '#FF0000','#C20404' '
return df
df_pg1 = df_pg[:16].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
{"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
])\
.apply(highlight_cells13, axis=None)
Saving the df_pg1 dataframe to the df_pg1.png file as an image to be used for the analysis later on.
dfi.export(df_pg1, 'df_pg1.png')
The df_pg1 dataframe.
Styling df_pg2 dataframe using the a function and the indexes to do so.
def highlight_cells13(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[0,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
df.iloc[1,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
df.iloc[2,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
df.iloc[3,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
df.iloc[4,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
df.iloc[5,:] = 'background-color:#C33F03;color:white;border-bottom: 2px solid black'
df.iloc[6,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
df.iloc[7,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
df.iloc[8,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
df.iloc[9,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
df.iloc[10,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
df.iloc[11,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
df.iloc[12,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
df.iloc[13,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
df.iloc[14,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
df.iloc[15,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
df.iloc[16,:] = 'background-color:#8C2E02;color:white;border-bottom: 2px solid black'
#df.iloc[6,:] = 'selector:th.row_heading;border-bottom: 3px solid red '#FF0000','#C20404' '
return df
df_pg2 = df_pg[16:].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
{"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
])\
.apply(highlight_cells13, axis=None)
Saving the df_pg2 dataframe to the df_pg2.png file as an image to be used for the analysis later on.
dfi.export(df_pg2, 'df_pg2.png')
The df_pg2 dataframe.
The 'df3' dataframe. (this dataframe is interactive)
df3
| Name | Revenue | Profit | Int Revenue | Int Profit | System Ratings | Net Profit Margin % |
|---|---|---|---|---|---|---|
| Loading... (need help?) |
Sorting the npm_sort2 list which consist of the list of the G-rated movies Net Profit Margin in accending order
npm_sort2 = []
for i in npm2:npm_sort2.append(i)
npm_sort2.sort();print(npm_sort2)
[33, 33, 40, 43, 45, 62, 65, 70, 72, 72, 76, 76, 78, 78, 79, 80, 85, 85, 87, 92, 94, 95, 95, 97, 99]
Getting the index of the sorted npm_sort2 list.
index_npm2 = []
for x,i in enumerate(npm_sort2):
if i in npm2:index_npm2.append(npm2.index(i))
print(index_npm2)
[9, 9, 11, 12, 5, 1, 4, 0, 2, 2, 19, 19, 7, 7, 20, 22, 10, 10, 8, 6, 14, 3, 3, 15, 18]
Re-arranging the dataframe df3 according to the Net Profit Margin of the movies going in accending order and making a new dataframe df_g.
df_g = df3.iloc[index_npm2]
Re-setting the index of the df_g datafrae.
df_g = df_g.reset_index()
Deleting unwanted columns from the newly created dataframe df_g .
del df_g['index']
del df_g['Int Revenue']
del df_g['Int Profit']
del df_g['System Ratings']
Deleting rows that are duplicated and resetting the index in the dataframe df_g .
df_g = df_g.drop([1, 9, 11, 13, 17, 22])
df_g = df_g.reset_index()
Deleting the colunm index due to it showing up when resetting the index in a datframe.
del df_g['index']
The first ten rows in the df_g dataframe.
df_g.head(10)
| Name | Revenue | Profit | Net Profit Margin % | |
|---|---|---|---|---|
| 0 | Pollyanna | $3,750,000 | $1,250,000 | 33% |
| 1 | Charlotte's Web | $143,985,708 | $58,985,708 | 40% |
| 2 | Kit Kittredge: An American Girl | $17,657,973 | $7,657,973 | 43% |
| 3 | Ramona and Beezus | $27,469,621 | $12,469,621 | 45% |
| 4 | Prancer | $18,587,135 | $11,587,135 | 62% |
| 5 | The Little Rascals | $66,947,950 | $43,947,950 | 65% |
| 6 | A Sunday in the Country | $2,411,143 | $1,711,143 | 70% |
| 7 | The Rookie | $80,693,537 | $58,693,537 | 72% |
| 8 | My Fair Lady 1964 | $72,071,636 | $55,071,636 | 76% |
| 9 | The Hunchback of Notre Drame | $325,500,000 | $255,500,000 | 78% |
Styling df_g1 dataframe using the a function and the indexes to do so.
def highlight_cells13(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[0,:] = 'background-color:#F1B5B4 ;color:black;border-bottom: 2px solid black'
df.iloc[1,:] = 'background-color:#ff6961;color:black;border-bottom: 2px solid black'
df.iloc[2,:] = 'background-color:#ff6961;color:black;border-bottom: 2px solid black'
df.iloc[3,:] = 'background-color:#ff6961;color:black;border-bottom: 2px solid black'
df.iloc[4,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
df.iloc[5,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
df.iloc[6,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
df.iloc[7,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
df.iloc[8,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
return df
df_g1 = df_g[:9].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
{"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
])\
.apply(highlight_cells13, axis=None)
Saving the df_g1 dataframe to the df_g1.png file as an image to be used for the analysis later on.
dfi.export(df_g1, 'df_g1.png')
The df_g1 dataframe.
Styling df_g2 dataframe using the a function and the indexes to do so.
def highlight_cells13(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[0,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
df.iloc[1,:] = 'background-color:#ef3038;color:black;border-bottom: 2px solid black'
df.iloc[2,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
df.iloc[3,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
df.iloc[4,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
df.iloc[5,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
df.iloc[6,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
df.iloc[7,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
df.iloc[8,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
df.iloc[9,:] = 'background-color:#cc6666;color:white;border-bottom: 2px solid black'
return df
df_g2 = df_g[9:].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
{"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
])\
.apply(highlight_cells13, axis=None)
Saving the df_g2 dataframe to the df_g2.png file as an image to be used for the analysis later on.
dfi.export(df_g2, 'df_g2.png')
The df_g2 dataframe.
The 'df4' dataframe. (this dataframe is interactive)
df4
| Name | Revenue | Profit | Int Revenue | Int Profit | System Ratings | Net Profit Margin % |
|---|---|---|---|---|---|---|
| Loading... (need help?) |
Sorting the npm_sort3 list which consist of the list of the PG-13 rated movies Net Profit Margin in accending order
npm_sort3 = []
for i in npm3:npm_sort3.append(i)
npm_sort3.sort();print(npm_sort3)
[9, 10, 16, 19, 27, 30, 36, 39, 43, 46, 48, 53, 55, 56, 56, 57, 58, 59, 60, 61, 62, 62, 63, 63, 67, 68, 69, 69, 70, 71, 72, 73, 73, 74, 74, 74, 75, 75, 76, 76, 76, 76, 78, 80, 81, 82, 82, 83, 83, 84, 84, 85, 86, 88, 88, 88, 88, 89, 90, 91, 93, 94, 94, 96]
Getting the index of the sorted npm_sort3 list.
index_npm3 = []
for x,i in enumerate(npm_sort3):
if i in npm3:index_npm3.append(npm3.index(i))
print(index_npm3)
[9, 32, 28, 58, 39, 7, 23, 3, 17, 18, 44, 6, 35, 2, 2, 15, 41, 51, 42, 14, 27, 27, 43, 43, 12, 48, 29, 29, 22, 37, 5, 49, 49, 26, 26, 26, 10, 10, 4, 4, 4, 4, 13, 19, 53, 25, 25, 47, 47, 0, 0, 45, 16, 1, 1, 1, 1, 50, 31, 63, 62, 34, 34, 55]
Re-arranging the dataframe df4 according to the Net Profit Margin of the movies going in accending order and making a new dataframe df_pg13.
df_pg13 = df4.iloc[index_npm3]
Re-setting the index of the df_pg13 datafrae.
df_pg13 = df_pg13.reset_index()
Deleting unwanted columns from the newly created dataframe df_pg13 .
del df_pg13['index']
del df_pg13['Int Revenue']
del df_pg13['Int Profit']
del df_pg13['System Ratings']
Deleting rows that are duplicated and resetting the index in the dataframe df_pg13 .
df_pg13 = df_pg13.drop([14, 20, 22, 26, 31, 33, 34, 36, 38, 39, 40, 45, 48, 53, 54, 55, 61])
df_pg13 = df_pg13.reset_index()
Deleting the colunm index due to it showing up when resetting the index in a datframe.
del df_pg13['index']
The first ten rows in the df_pg13 dataframe.
df_pg13.head(10)
| Name | Revenue | Profit | Net Profit Margin % | |
|---|---|---|---|---|
| 0 | Charlie St. Cloud | $48,478,084 | $4,478,084 | 9% |
| 1 | The Light Between Oceans | $22,281,732 | $2,281,732 | 10% |
| 2 | Draft Day | $29,847,480 | $4,847,480 | 16% |
| 3 | Rabbit Hole | $6,205,034 | $1,205,034 | 19% |
| 4 | Country Strong | $20,601,987 | $5,601,987 | 27% |
| 5 | Anna Karenina | $71,004,627 | $22,004,627 | 30% |
| 6 | The Best of Me | $41,059,418 | $15,059,418 | 36% |
| 7 | Burlesque | $90,552,675 | $35,552,675 | 39% |
| 8 | The Tree of Life | $61,721,826 | $26,721,826 | 43% |
| 9 | The Longest Ride | $63,802,928 | $29,802,928 | 46% |
Styling df_pg131 dataframe using the a function and the indexes to do so.
def highlight_cells13(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[0,:] = 'background-color:#E97451;color:black;border-bottom: 2px solid black'
df.iloc[1,:] = 'background-color:#E97451;color:black;border-bottom: 2px solid black'
df.iloc[2,:] = 'background-color:#E97451;color:black;border-bottom: 2px solid black'
df.iloc[3,:] = 'background-color:#E97451;color:black;border-bottom: 2px solid black'
df.iloc[4,:] = 'background-color:#CD5C5C;color:black;border-bottom: 2px solid black'
df.iloc[5,:] = 'background-color:#CD5C5C;color:black;border-bottom: 2px solid black'
df.iloc[6,:] = 'background-color:#CD5C5C;color:black;border-bottom: 2px solid black'
df.iloc[7,:] = 'background-color:#CD5C5C;color:black;border-bottom: 2px solid black'
df.iloc[8,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
df.iloc[9,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
df.iloc[10,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
df.iloc[11,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
df.iloc[12,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
df.iloc[13,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
df.iloc[14,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
df.iloc[15,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
df.iloc[16,:] = 'background-color:#B22222;color:white;border-bottom: 2px solid black'
df.iloc[17,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[18,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[19,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[20,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[21,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[22,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[23,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
#df.iloc[24,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
#df.iloc[25,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
return df
df_pg131 = df_pg13[:24].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
{"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
])\
.apply(highlight_cells13, axis=None)
Saving the df_pg131 dataframe to the df_pg131.png file as an image to be used for the analysis later on.
dfi.export(df_pg131, 'df_pg131.png')
The df_pg131 dataframe.
Styling df_pg132 dataframe using the a function and the indexes to do so.
def highlight_cells13(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[0,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[1,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[2,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[3,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[4,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[5,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[6,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[7,:] = 'background-color:#C04000;color:white;border-bottom: 2px solid black'
df.iloc[8,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[9,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[10,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[11,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[12,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[13,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[14,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[15,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[16,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[17,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[18,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[19,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[20,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[21,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
df.iloc[22,:] = 'background-color:#8B0000;color:white;border-bottom: 2px solid black'
return df
df_pg132 = df_pg13[24:].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
{"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
])\
.apply(highlight_cells13, axis=None)
Saving the df_pg132 dataframe to the df_pg132.png file as an image to be used for the analysis later on.
dfi.export(df_pg132, 'df_pg132.png')
The df_pg132 dataframe.
The 'df5' dataframe. (this dataframe is interactive)
df5
| Name | Revenue | Profit | Int Revenue | Int Profit | System Ratings | Net Profit Margin % |
|---|---|---|---|---|---|---|
| Loading... (need help?) |
Sorting the npm_sort4 list which consist of the list of the NC-17 rated movies Net Profit Margin in accending order
npm_sort4 = []
for i in npm4:npm_sort4.append(i)
npm_sort4.sort();print(npm_sort4)
[0, 0, 2, 6, 27, 32, 40, 50, 56, 58, 61, 68, 68, 68, 68, 73, 76, 76, 77, 79, 79, 86, 88, 90, 92, 92, 92, 93, 93, 94, 94, 95, 96, 97]
Getting the index of the sorted npm_sort4 list.
index_npm4 = []
for x,i in enumerate(npm_sort4):
if i in npm4:index_npm4.append(npm4.index(i))
print(index_npm4)
[2, 2, 11, 16, 1, 22, 4, 24, 15, 33, 31, 0, 0, 0, 0, 21, 18, 18, 8, 10, 10, 19, 25, 6, 3, 3, 3, 14, 14, 17, 17, 32, 28, 29]
Re-arranging the dataframe df5 according to the Net Profit Margin of the movies going in accending order and making a new dataframe df_nc17.
df_nc17 = df5.iloc[index_npm4]
Re-setting the index of the df_nc17 datafrae.
df_nc17 = df_nc17.reset_index()
Deleting unwanted columns from the newly created dataframe df_nc17 .
del df_nc17['index']
del df_nc17['Int Revenue']
del df_nc17['Int Profit']
del df_nc17['System Ratings']
Deleting rows that are duplicated and resetting the index in the dataframe df_nc17 .
df_nc17 = df_nc17.drop([0, 11, 12, 13, 16, 19, 24, 25, 27, 29])
df_nc17 = df_nc17.reset_index()
Deleting the colunm index due to it showing up when resetting the index in a datframe.
del df_nc17['index']
The first ten rows in the df_nc17 dataframe.
df_nc17.head(10)
| Name | Revenue | Profit | Net Profit Margin % | |
|---|---|---|---|---|
| 0 | Whore | $1,008,404 | $8,404 | 0% |
| 1 | Whore | $1,008,404 | $8,404 | 0% |
| 2 | The Dreamers | $15,307,113 | $307,113 | 2% |
| 3 | Elles | $3,822,241 | $256,669 | 6% |
| 4 | Matador | $17,356,268 | $4,856,268 | 27% |
| 5 | Natural Born Killers | $50,283,563 | $16,283,563 | 32% |
| 6 | Wide Sargasso Sea | $1,614,784 | $659,312 | 40% |
| 7 | Bad Lieutenant | $2,038,916 | $1,038,916 | 50% |
| 8 | Two Girls and a Guy | $2,315,026 | $1,315,026 | 56% |
| 9 | Law of Desire | $1,470,809 | $858,737 | 58% |
Styling df_nc171 dataframe using the a function and the indexes to do so.
def highlight_cells13(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[0,:] = 'background-color:#FCE6F2;color:black;border-bottom: 2px solid black'
df.iloc[1,:] = 'background-color:#FCE6F2;color:black;border-bottom: 2px solid black'
df.iloc[2,:] = 'background-color:#FCE6F2;color:black;border-bottom: 2px solid black'
df.iloc[3,:] = 'background-color:#DB7093;color:white;border-bottom: 2px solid black'
df.iloc[4,:] = 'background-color:#DB7093;color:white;border-bottom: 2px solid black'
df.iloc[5,:] = 'background-color:#E0115F;color:white;border-bottom: 2px solid black'
df.iloc[6,:] = 'background-color:#E0115F;color:white;border-bottom: 2px solid black'
df.iloc[7,:] = 'background-color:#E0115F;color:white;border-bottom: 2px solid black'
df.iloc[8,:] = 'background-color:#E0115F;color:white;border-bottom: 2px solid black'
df.iloc[9,:] = 'background-color:#953553;color:white;border-bottom: 2px solid black'
df.iloc[10,:] = 'background-color:#953553;color:white;border-bottom: 2px solid black'
df.iloc[11,:] = 'background-color:#953553;color:white;border-bottom: 2px solid black'
return df
df_nc171 = df_nc17[:12].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
{"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
])\
.apply(highlight_cells13, axis=None)
Saving the df_nc171 dataframe to the df_nc171.png file as an image to be used for the analysis later on.
dfi.export(df_nc171, 'df_nc171.png')
The df_nc171 dataframe.
Styling df_nc172 dataframe using the a function and the indexes to do so.
def highlight_cells13(x):
df = x.copy()
df.loc[:,:] = ''
df.iloc[0,:] = 'background-color:#953553;color:white;border-bottom: 2px solid black'
df.iloc[1,:] = 'background-color:#953553;color:white;border-bottom: 2px solid black'
df.iloc[2,:] = 'background-color:#953553;color:white;border-bottom: 2px solid black'
df.iloc[3,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
df.iloc[4,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
df.iloc[5,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
df.iloc[6,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
df.iloc[7,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
df.iloc[8,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
df.iloc[9,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
df.iloc[10,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
df.iloc[11,:] = 'background-color:#702963;color:white;border-bottom: 2px solid black'
return df
df_nc172 = df_nc17[12:].style.set_table_styles([{'selector' : '','props' : [('border','3px solid #FFFAF0')]},
{"selector":"thead", 'props':[("background-color","#FFFAF0"),("color","black")]},#headinig
#{'selector':"td", "props":[("background-color","white"), ("color"," black")]},#inside chart
{'selector':'th.row_heading', 'props':[('background-color','#FFFAF0'),('color','black')]},#index
])\
.apply(highlight_cells13, axis=None)
Saving the df_nc172 dataframe to the df_nc172.png file as an image to be used for the analysis later on.
dfi.export(df_nc172, 'df_nc172.png')
The df_nc172 dataframe.
This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each R-rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML below. (the graph below is interactive, you can hover over the pie chart)
%%html
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
<span class="highcharts-figure">
<div id="ruth"></div>
<p class="highcharts-description">
</p>
</span>
This is the Javascript Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each R-rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML .
%%js
Highcharts.chart('ruth', {
chart: {
width:650,
height:450,
styledMode: false,
plotBackgroundColor: null,
plotBorderWidth: null,
plotShadow: false,
type: 'pie'
},
title: {
text: '<span style="color:#C20404">Net Profit Margin ot R-rated Drama Movies </span>'
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
tooltip: {
pointFormat: '{point.name}: <b>{point.percentage:.1f}%</b>'
},
accessibility: {
point: {
valueSuffix: '%'
}
},
plotOptions: {
pie: {
allowPointSelect: true,
cursor: 'pointer',
dataLabels: {
enabled: true,
format: '<b>{point.name}</b>: {point.percentage:.1f} %'
},
showInLegend: true
}
},
series: [{
name: 'Net Profit Margin',
colorByPoint: true,
colors: ['#EFB8B8','#E66A6A','#FF0000','#C20404', '#690000'],
data: [{
name: '0%-20%',
y: 6,
}, {
name: '20%-40%',
y: 8,
},{
name: '40%-60%',
y: 4,
},{
name: '60%-80%',
y: 8,
}, {
name: '80%-100%',
y: 14,
sliced: true,
selected: true
}]
}]
});
This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each PG-rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML below. (the graph below is interactive, you can hover over the pie chart)
%%html
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
<span class="highcharts-figure">
<div id="ruth1"></div>
<p class="highcharts-description">
</p>
</span>
This is the Javascript Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each PG-rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML .
%%js
Highcharts.chart('ruth1', {
chart: {
width:650,
height:450,
styledMode: false,
plotBackgroundColor: null,
plotBorderWidth: null,
plotShadow: false,
type: 'pie'
},
title: {
text: '<span style="color:#FF5000">Net Profit Margin ot PG-rated Drama Movies </span>'
},
tooltip: {
pointFormat: '{point.name}: <b>{point.percentage:.1f}%</b>'
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
accessibility: {
point: {
valueSuffix: '%'
}
},
plotOptions: {
pie: {
allowPointSelect: true,
cursor: 'pointer',
dataLabels: {
enabled: true,
format: '<b>{point.name}</b>: {point.percentage:.1f} %'
},
showInLegend: true
}
},
series: [{
name: 'Net Profit Margin',
colorByPoint: true,
colors: ['#F8C9B4','#F5966B','#FF5000','#C33F03',
'#8C2E02'],
data: [{
name: '0%-20%',
y: 2,
}, {
name: '20%-40%',
y: 3,
},{
name: '40%-60%',
y: 7,
},{
name: '60%-80%',
y: 10,
}, {
name: '80%-100%',
y: 11,
sliced: true,
selected: true
}]
}]
});
This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each G-rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTMl below. (the graph below is interactive, you can hover over the pie chart)
%%html
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
<span class="highcharts-figure">
<div id="ruth2"></div>
<p class="highcharts-description">
</p>
</span>
This is the Javascript Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each G-rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML .
%%js
Highcharts.chart('ruth2', {
chart: {
width:650,
height:450,
styledMode: false,
plotBackgroundColor: null,
plotBorderWidth: null,
plotShadow: false,
type: 'pie'
},
title: {
text: '<span style="color:#ef3038">Net Profit Margin ot G-rated Drama Movies </span>'
},
tooltip: {
pointFormat: '{point.name}: <b>{point.percentage:.1f}%</b>'
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
accessibility: {
point: {
valueSuffix: '%'
}
},
plotOptions: {
pie: {
allowPointSelect: true,
cursor: 'pointer',
dataLabels: {
enabled: true,
format: '<b>{point.name}</b>: {point.percentage:.1f} %'
},
showInLegend: true
}
},
series: [{
name: 'Net Profit Margin',
colorByPoint: true,
colors: ['#F1B5B4 ','#ff6961','#ef3038','#cc6666'],
data: [{
name: '20%-40%',
y: 1,
},{
name: '40%-60%',
y: 3,
},{
name: '60%-80%',
y: 7,
}, {
name: '80%-100%',
y: 8,
sliced: true,
selected: true
}]
}]
});
This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each PG-13 rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML below . (the graph below is interactive, you can hover over the pie chart)
%%html
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
<span class="highcharts-figure">
<div id="ruth3"></div>
<p class="highcharts-description">
</p>
</span>
This is the Javascript Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each PG-13 rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML .
%%js
Highcharts.chart('ruth3', {
chart: {
width:650,
height:450,
styledMode: false,
plotBackgroundColor: null,
plotBorderWidth: null,
plotShadow: false,
type: 'pie'
},
title: {
text: '<span style="color:#8B0000">Net Profit Margin ot PG-13 Rated Drama Movies </span>'
},
tooltip: {
pointFormat: '{point.name}: <b>{point.percentage:.1f}%</b>'
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
accessibility: {
point: {
valueSuffix: '%'
}
},
plotOptions: {
pie: {
allowPointSelect: true,
cursor: 'pointer',
dataLabels: {
enabled: true,
format: '<b>{point.name}</b>: {point.percentage:.1f} %'
},
showInLegend: true
}
},
series: [{
name: 'Net Profit Margin',
colorByPoint: true,
colors: ['#E97451','#CD5C5C','#B22222','#C04000',
'#8B0000'],
data: [{
name: '0%-20%',
y: 4,
}, {
name: '20%-40%',
y: 4,
},{
name: '40%-60%',
y: 9,
},{
name: '60%-80%',
y: 15,
}, {
name: '80%-100%',
y: 15,
sliced: true,
selected: true
}]
}]
});
This is the HTML Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each NC-17 rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML below. (the graph below is interactive, you can hover over the pie chart)
%%html
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
<span class="highcharts-figure">
<div id="ruth4"></div>
<p class="highcharts-description">
</p>
</span>
This is the Javascript Script from Highcharts Libaray to visualize the data of the percentage of each Net Profit Margin categorry (ranging from 0-100% NMP) that each NC-17 rated Drama Movie belongs to , within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML .
%%js
Highcharts.chart('ruth4', {
chart: {
width:650,
height:450,
styledMode: false,
plotBackgroundColor: null,
plotBorderWidth: null,
plotShadow: false,
type: 'pie'
},
title: {
text: '<span style="color:#E0115F">Net Profit Margin ot NC-17 Rated Drama Movies </span>'
},
tooltip: {
pointFormat: '{point.name}: <b>{point.percentage:.1f}%</b>'
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
accessibility: {
point: {
valueSuffix: '%'
}
},
plotOptions: {
pie: {
allowPointSelect: true,
cursor: 'pointer',
dataLabels: {
enabled: true,
format: '<b>{point.name}</b>: {point.percentage:.1f} %'
},
showInLegend: true
}
},
series: [{
name: 'Net Profit Margin',
colorByPoint: true,
colors: ['#FCE6F2','#DB7093','#E0115F','#953553',
'#702963'],
data: [{
name: '0%-20%',
y: 3,
}, {
name: '20%-40%',
y: 2,
},{
name: '40%-60%',
y: 4,
},{
name: '60%-80%',
y: 6,
}, {
name: '80%-100%',
y: 9,
sliced: true,
selected: true
}]
}]
});
This js the blueprint for creating the third visualization Budget and Revenue of Movies, altair will be used to create this graph.
Blueprint:
The format of the dataframe needed for this graph is straight forward, based on the ideology of the persception of the chart, these are the colunms needed for thr dataframe;
The style of this chart is a Selection Histogram which is found in the Altairs Gallery. It is a scatter plot with a x-axis and a y-axis. The x-axis is the Revenue and the y-axis is the Budget, this set up projects wether a linear regression, meaning if the hypothesis which is, the higher the budget the higher the revenue is proven right projecting a linear regression.
The graph has an attachted histogram showing the amount of items in each category within the selection. In order to make a selection create a box by dragging the mouse. When the mouse hovers the pionts it projects the name, system rating, budget of the movies and the reevenue of the movies.
The is the 'Drama_DataFrame' dataframe. (this dataframe is interactive)
Drama_DataFrame
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | Worldwide_Gross | Worldwide_Gross_x | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Loading... (need help?) |
Getting the opening weekend of movies that are R-rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe. This data was obtained through research.
r_opening_weekend = [30122888, 37513109, 14953664, 46607250, 38560195, 13143310, 24400000, 85171450, 736311,
24900566, 10470145, 492648, 1220335, 19497324, 9700000, 5100000, 1443809, 237264, 118298,
224476, 2002165, 160547, 253510, 47122, 13575172, 257174, 256498, 24587, 7485546, 473882,
52041, 387618, 8800230, 561906, 135388, 246914, 6661234, 84797, 156833, 1767308, 81006,
18623, 100268, 3762145, 193728, 137651, 63461, 36134, 104030, 170335, 118150, 13307125,
2105729,63356, 2337594, 287081]
print(r_opening_weekend)#showing the r_opening_weekend list
[30122888, 37513109, 14953664, 46607250, 38560195, 13143310, 24400000, 85171450, 736311, 24900566, 10470145, 492648, 1220335, 19497324, 9700000, 5100000, 1443809, 237264, 118298, 224476, 2002165, 160547, 253510, 47122, 13575172, 257174, 256498, 24587, 7485546, 473882, 52041, 387618, 8800230, 561906, 135388, 246914, 6661234, 84797, 156833, 1767308, 81006, 18623, 100268, 3762145, 193728, 137651, 63461, 36134, 104030, 170335, 118150, 13307125, 2105729, 63356, 2337594, 287081]
Getting the opening weekend of movies that are PG-rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe. This data was obtained through research.
pg_opening_weekend = [11364505, 19152401, 27547866, 16007426, 11351389, 44542, 1203011, 0, 67877361, 11351389,
27547866, 16755310, 8146533, 0, 12177488, 5268764, 9178233, 13616196, 6011585, 22564512,
9421369, 6836036, 16007426, 9244641, 14466, 24517121, 20584908, 124011, 721341, 82601,
1528982,2739680, 5609875, 298277, 2189966, 89054, 93005, 89213, 0, 2534729, 16015408,
518795, 12146143, 46977, 8556935, 5088381]
print(pg_opening_weekend)#showing the pg_opening_weekend list
[11364505, 19152401, 27547866, 16007426, 11351389, 44542, 1203011, 0, 67877361, 11351389, 27547866, 16755310, 8146533, 0, 12177488, 5268764, 9178233, 13616196, 6011585, 22564512, 9421369, 6836036, 16007426, 9244641, 14466, 24517121, 20584908, 124011, 721341, 82601, 1528982, 2739680, 5609875, 298277, 2189966, 89054, 93005, 89213, 0, 2534729, 16015408, 518795, 12146143, 46977, 8556935, 5088381]
Getting the opening weekend of movies that are G-rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe. This data was obtained through research.
g_opening_weekend = [0, 2914486, 16021684, 162146, 10028065, 7810481, 0, 21037414, 8742545, 0, 679185, 11457353
, 220297, 16021684, 4625583, 0, 10103675, 1586753, 0, 0, 0, 0, 0, 0, 0]
print(g_opening_weekend)#showing the g_opening_weekend list
[0, 2914486, 16021684, 162146, 10028065, 7810481, 0, 21037414, 8742545, 0, 679185, 11457353, 220297, 16021684, 4625583, 0, 10103675, 1586753, 0, 0, 0, 0, 0, 0, 0]
Getting the opening weekend of movies that are PG-13 rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe. This data was obtained through research.
pg13_opening_weekend = [55785112, 35258, 22403596, 11947744, 35574710, 526011, 220522, 320690, 24074047,
12381585, 15371203, 143818, 16842353, 29632823, 14789393, 7102085, 24830443, 372920,
13019686, 11731703, 41202458, 13203458, 21401594, 10003827, 26044590, 30468614,
22618358, 12305016, 9783603, 13002632, 129462, 18723269, 4765838, 105005, 9851102,
15002635, 8089139, 20874072, 30452, 30452, 5079566, 76244, 228359, 8310232, 5467084,
187281, 15679190, 11727390, 2215891, 68266, 14065500, 6213362, 13501349, 446380,
4750894, 21688103, 212000, 4690214, 53778, 55438, 9112839, 20321, 128140, 77740]
print(pg13_opening_weekend)#showing the pg13_opening_weekend list
[55785112, 35258, 22403596, 11947744, 35574710, 526011, 220522, 320690, 24074047, 12381585, 15371203, 143818, 16842353, 29632823, 14789393, 7102085, 24830443, 372920, 13019686, 11731703, 41202458, 13203458, 21401594, 10003827, 26044590, 30468614, 22618358, 12305016, 9783603, 13002632, 129462, 18723269, 4765838, 105005, 9851102, 15002635, 8089139, 20874072, 30452, 30452, 5079566, 76244, 228359, 8310232, 5467084, 187281, 15679190, 11727390, 2215891, 68266, 14065500, 6213362, 13501349, 446380, 4750894, 21688103, 212000, 4690214, 53778, 55438, 9112839, 20321, 128140, 77740]
Getting the opening weekend of movies that are NC-17 rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe. This data was obtained through research.
nc17_opening_weekend = [361000, 69100, 0, 0, 0, 85709, 738339, 143632, 63918, 361000, 100316, 142632, 361000,
100316, 193728, 649423, 24286, 11014818, 63918, 25775847, 361000, 0, 11166687, 31665,
245398, 0, 85709, 738339, 100000, 70188, 63918, 130303, 0, 0]
print(nc17_opening_weekend)#showing the nc17_opening_weekend list
[361000, 69100, 0, 0, 0, 85709, 738339, 143632, 63918, 361000, 100316, 142632, 361000, 100316, 193728, 649423, 24286, 11014818, 63918, 25775847, 361000, 0, 11166687, 31665, 245398, 0, 85709, 738339, 100000, 70188, 63918, 130303, 0, 0]
Getting the Budget of movies that are R-rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe.
r_cost = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='R'and Drama_DataFrame.Profit[i] >= 0:
r_cost.append(int(Drama_DataFrame.Production_Budget[i]))
print(r_cost)#showing the r_cost list
[100000000, 61000000, 60000000, 55000000, 55000000, 55000000, 52500000, 40000000, 37500000, 31000000, 23000000, 22500000, 22500000, 21000000, 20000000, 20000000, 13000000, 13000000, 13000000, 12000000, 12000000, 12000000, 11800000, 11000000, 10000000, 9400000, 8500000, 7000000, 5000000, 4900000, 4750000, 4000000, 3500000, 3400000, 3300000, 3000000, 2000000, 2000000, 2000000, 2000000, 2000000, 2000000, 1987650, 1500000, 1000000, 1000000, 1000000, 135000, 100000, 6000000, 8500000, 20000000, 100000, 2700000, 11500000, 9000000]
Getting the Budget of movies that are PG-rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe.
pg_cost = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='PG'and Drama_DataFrame.Profit[i] >= 0:
pg_cost.append(int(Drama_DataFrame.Production_Budget[i]))
print(pg_cost)#showing the pg_cost list
[180000000, 37000000, 20000000, 20000000, 3000000, 1700000, 5100000, 10000000, 95000000, 3000000, 20000000, 40000000, 5000000, 422000, 11800000, 15000000, 32000000, 40000000, 8000000, 17000000, 30000000, 500000, 20000000, 2000000, 23000000, 32000000, 90000000, 10000000, 16000000, 3000000, 15000000, 10000000, 20000000, 12000000, 5000000, 7000000, 14000000, 15000000, 12000000, 7500000, 17000000, 5000000, 22000000, 4500000, 8200000, 28000000]
Getting the Budget of movies that are G-rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe.
g_cost = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='G'and Drama_DataFrame.Profit[i] >= 0:
g_cost.append(int(Drama_DataFrame.Production_Budget[i]))
print(g_cost)#showing the g_cost list
[700000, 7000000, 22000000, 20000000, 23000000, 15000000, 2700000, 70000000, 30000000, 2500000, 666000, 85000000, 10000000, 22000000, 18000000, 8200000, 60000000, 45000000, 858000, 17000000, 10000000, 6400000, 13000000, 1750000, 1700000]
Getting the Budget of movies that are PG-13 rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe.
pg13_cost = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='PG-13'and Drama_DataFrame.Profit[i] >= 0:
pg13_cost.append(int(Drama_DataFrame.Production_Budget[i]))
print(pg13_cost)#showing the pg13_cost list
[110000000, 75000000, 60000000, 55000000, 50000000, 50000000, 50000000, 49000000, 47000000, 44000000, 40000000, 40000000, 38000000, 37000000, 37000000, 36000000, 35000000, 35000000, 34000000, 33000000, 30000000, 30000000, 28000000, 26000000, 25000000, 25000000, 25000000, 25000000, 25000000, 25000000, 24000000, 20000000, 20000000, 19000000, 17000000, 17000000, 16000000, 16000000, 15000000, 15000000, 15000000, 14000000, 13000000, 12000000, 12000000, 11000000, 11000000, 10000000, 10000000, 9700000, 9000000, 9000000, 7400000, 7000000, 6000000, 5000000, 5000000, 5000000, 5000000, 2600000, 2000000, 1400000, 250000, 175000]
Getting the Budget of movies that are NC-17 rated that are in the Drama Genre from the 'Drama_DataFrame' dataframe.
nc17_cost = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='NC-17'and Drama_DataFrame.Profit[i] >= 0:
nc17_cost.append(int(Drama_DataFrame.Production_Budget[i]))
print(nc17_cost)#showing the nc17_cost list
[6500000, 12500000, 1000000, 20000, 955472, 1500000, 9000000, 15000000, 15000000, 6500000, 4000000, 15000000, 6500000, 4074940, 1000000, 1000000, 3565572, 12000000, 15000000, 350000, 6500000, 904765, 34000000, 230000, 1000000, 1000000, 1500000, 6500000, 1250000, 12000, 15000000, 2200000, 50000, 612072]
Creating the df_opening dataframe.
df_opening = pd.DataFrame({'Budget':r_cost+pg_cost+g_cost+pg13_cost+nc17_cost,
"Opening_Weekend":r_opening_weekend+pg_opening_weekend
+g_opening_weekend+pg13_opening_weekend+nc17_opening_weekend,
"Profit":profit_int+profit_int1+profit_int2+profit_int3+profit_int4
})
The 'df_opening' dataframe. (this dataframe is interactive)
df_opening
| Budget | Opening_Weekend | Profit |
|---|---|---|
| Loading... (need help?) |
Creating a 3D scatter plot of the Profit, Budget and Opening Weekend of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimatio' libary to create a 3d scatter plot animate object
def animate(i):
# azimuth angle : 0 deg to 360 deg
ax.view_init(elev=10, azim=i*4)
return fig
# Creating dataset
z = df_opening['Profit']
x = df_opening['Budget']
y = df_opening['Opening_Weekend']
# Creating figure
#fig = plt.figure(figsize = (3.5, 8))
plt.rcParams['figure.figsize'] = [3.6, 3.6]
ax = plt.axes(projection ="3d")
# Creating plot
ax.scatter3D(x, y, z, color = "red")
#plt.title("simple 3D scatter plot")
ax.set_xlabel('Budget', size = 7.5)
ax.set_ylabel('Opening_Weekend', size = 7.5)
ax.set_zlabel('Profit', size = 7.5)
# show plot
ax.tick_params('z', labelsize=7)
plt.xticks(np.arange(0, 180000000, 25000000))
plt.xticks(fontsize=7)
plt.yticks(fontsize=7)
plt.show()
#Creating the Animation object
ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
Saving the animated 3D scatter plot gif as 'drama1.gif'.
#f = r"c://Users/xxDownloads/Project%201/Ani.gif"
writergif = animation.PillowWriter(fps=30)
#ani.save(f, writer=writergif)
ani.save('drama1.gif', fps=10)
#ani.save('first44.gif')
WARNING:matplotlib.animation:MovieWriter ffmpeg unavailable; using Pillow instead.
The first 3D Scatter Plot (part A): the x-axis is the 'Budegt', the y-axis is the 'Opening Weekend' and the z-axis is the 'Profit'. The purpose of this animation is to see if the amount of the budget and opening weekend of a movie as an affect on the profit of a movie.
Creating a 3D scatter plot with a linear plane of the Profit, Budget and Opening Weekend of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimatio' libary to create a 3d scatter plot animate object
def init():
# Plot the surface
ax.scatter(df['Budget'],df['Opening_Weekend'],df['Profit'],alpha=0.5, s=50,color='red')
#ax.plot_surface(x_surf,y_surf,fittedY, alpha=0.4 ,rstride=1, cstride=1)
return fig
def animate(i):
# azimuth angle : 0 deg to 360 deg
ax.view_init(elev=10, azim=i*4)
return fig
def func(num, dataSet, line):
# NOTE: there is no .set_data() for 3 dim data...
sscatter.set_data(dataSet[0:2, :num])
sscatter.set_3d_properties(dataSet[2, :num])
return sscatter
dataSet = np.array([df['Budget'],df['Opening_Weekend'],df['Profit']])
numDataPoints = len(df['Profit'])
fig = plt.figure()
#fig1 = plt.figure()
#ax = Axes3D(fig)
ax = Axes3D(fig)
#scatter = ax.scatter(dataSet[0], dataSet[1], dataSet[2],alpha=0.5, s=50,color='red')
linear = ax.scatter(dataSet[0], dataSet[1], dataSet[2],alpha=0.5, s=40,color='red')
linear = ax.plot_surface(x_surf,y_surf,fittedY, alpha=0.4 ,rstride=1, cstride=1, color='brown')
ax.set_xlabel('Budget')
ax.set_ylabel('Opening_Weekend')
ax.set_zlabel('Profit')
#plt.show(ax,ax1)
#Creating the Animation object
#line_ani = animation.FuncAnimation(fig, func ,frames=numDataPoints, fargs=(dataSet,line), interval=50, blit=False)
#line_ani.save(r'AnimationNeww.gif')
# Animate frames=90
ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
ani
C:\Users\rutho\AppData\Local\Temp/ipykernel_24224/68577760.py:24: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6. This is consistent with other Axes classes. ax = Axes3D(fig)
<matplotlib.animation.FuncAnimation at 0x2991d6e4d90>
Saving the animated 3D scatter plot gif with a linaer plane as 'drama.gif'.
#f = r"c://Users/xxDownloads/Project%201/Ani.gif"
writergif = animation.PillowWriter(fps=30)
#ani.save(f, writer=writergif)
ani.save('drama.gif', fps=10)
#ani.save('first44.gif')
MovieWriter ffmpeg unavailable; using Pillow instead.
The first 3D Scatter Plot (part B): the x-axis is the 'Budegt', the y-axis is the 'Opening Weekend' and the z-axis is the 'Profit'. The purpose of this animation is to see if the amount of the budget and opening weekend of a movie as an affect on the profit of a movie.
Getting the month R-rated Drama Movies were released from the 'Drama_DataFrame' dataframe and labeling them from 1-12 going from Janurary-December.
r_month = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='R'and Drama_DataFrame.Profit[i] >= 0:
if Drama_DataFrame.Release_Date[i][:3] == 'Jan':r_month.append(1)
elif Drama_DataFrame.Release_Date[i][:3] == 'Feb':r_month.append(2)
elif Drama_DataFrame.Release_Date[i][:3] == 'Mar':r_month.append(3)
elif Drama_DataFrame.Release_Date[i][:3] == 'Apr':r_month.append(4)
elif Drama_DataFrame.Release_Date[i][:3] == 'May':r_month.append(5)
elif Drama_DataFrame.Release_Date[i][:3] == 'Jun':r_month.append(6)
elif Drama_DataFrame.Release_Date[i][:3] == 'Jul':r_month.append(7)
elif Drama_DataFrame.Release_Date[i][:3] == 'Aug':r_month.append(8)
elif Drama_DataFrame.Release_Date[i][:3] == 'Sep':r_month.append(9)
elif Drama_DataFrame.Release_Date[i][:3] == 'Oct':r_month.append(10)
elif Drama_DataFrame.Release_Date[i][:3] == 'Nov':r_month.append(11)
elif Drama_DataFrame.Release_Date[i][:3] == 'Dec':r_month.append(12)
else:r_month.append('Nan')
Showing the 'r_month' list.
print(r_month)
[12, 10, 5, 2, 2, 10, 12, 2, 9, 11, 10, 11, 4, 11, 8, 10, 12, 4, 10, 12, 9, 3, 11, 1, 6, 11, 11, 1, 10, 1, 9, 7, 2, 10, 10, 5, 3, 6, 10, 8, 4, 10, 9, 3, 12, 10, 5, 4, 7, 9, 5, 7, 12, 1, 10, 9]
Getting the month PG-rated Drama Movies were released from the 'Drama_DataFrame' dataframe and labeling them from 1-12 going from Janurary-December.
pg_month = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='PG'and Drama_DataFrame.Profit[i] >= 0:
if Drama_DataFrame.Release_Date[i][:3] == 'Jan':pg_month.append(1)
elif Drama_DataFrame.Release_Date[i][:3] == 'Feb':pg_month.append(2)
elif Drama_DataFrame.Release_Date[i][:3] == 'Mar':pg_month.append(3)
elif Drama_DataFrame.Release_Date[i][:3] == 'Apr':pg_month.append(4)
elif Drama_DataFrame.Release_Date[i][:3] == 'May':pg_month.append(5)
elif Drama_DataFrame.Release_Date[i][:3] == 'Jun':pg_month.append(6)
elif Drama_DataFrame.Release_Date[i][:3] == 'Jul':pg_month.append(7)
elif Drama_DataFrame.Release_Date[i][:3] == 'Aug':pg_month.append(8)
elif Drama_DataFrame.Release_Date[i][:3] == 'Sep':pg_month.append(9)
elif Drama_DataFrame.Release_Date[i][:3] == 'Oct':pg_month.append(10)
elif Drama_DataFrame.Release_Date[i][:3] == 'Nov':pg_month.append(11)
elif Drama_DataFrame.Release_Date[i][:3] == 'Dec':pg_month.append(12)
else:pg_month.append('Nan')
Showing the 'pg_month' list.
print(pg_month)
[11, 9, 11, 3, 8, 2, 10, 6, 3, 8, 11, 12, 8, 12, 1, 10, 10, 6, 4, 2, 11, 9, 3, 3, 1, 7, 7, 5, 1, 2, 11, 10, 12, 10, 7, 9, 12, 2, 12, 6, 5, 7, 7, 3, 2, 5]
Getting the month G-rated Drama Movies were released from the 'Drama_DataFrame' dataframe and labeling them from 1-12 going from Janurary-December.
g_month = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='G'and Drama_DataFrame.Profit[i] >= 0:
if Drama_DataFrame.Release_Date[i][:3] == 'Jan':g_month.append(1)
elif Drama_DataFrame.Release_Date[i][:3] == 'Feb':g_month.append(2)
elif Drama_DataFrame.Release_Date[i][:3] == 'Mar':g_month.append(3)
elif Drama_DataFrame.Release_Date[i][:3] == 'Apr':g_month.append(4)
elif Drama_DataFrame.Release_Date[i][:3] == 'May':g_month.append(5)
elif Drama_DataFrame.Release_Date[i][:3] == 'Jun':g_month.append(6)
elif Drama_DataFrame.Release_Date[i][:3] == 'Jul':g_month.append(7)
elif Drama_DataFrame.Release_Date[i][:3] == 'Aug':g_month.append(8)
elif Drama_DataFrame.Release_Date[i][:3] == 'Sep':g_month.append(9)
elif Drama_DataFrame.Release_Date[i][:3] == 'Oct':g_month.append(10)
elif Drama_DataFrame.Release_Date[i][:3] == 'Nov':g_month.append(11)
elif Drama_DataFrame.Release_Date[i][:3] == 'Dec':g_month.append(12)
else:g_month.append('Nan')
Showing the 'g_month' list.
print(g_month)
[4, 11, 3, 11, 8, 7, 10, 6, 8, 5, 12, 10, 7, 3, 4, 4, 12, 6, 8, 12, 3, 11, 10, 9, 5]
Getting the month PG-13 rated Drama Movies were released from the 'Drama_DataFrame' dataframe and labeling them from 1-12 going from Janurary-December.
pg13_month = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='PG-13'and Drama_DataFrame.Profit[i] >= 0:
if Drama_DataFrame.Release_Date[i][:3] == 'Jan':pg13_month.append(1)
elif Drama_DataFrame.Release_Date[i][:3] == 'Feb':pg13_month.append(2)
elif Drama_DataFrame.Release_Date[i][:3] == 'Mar':pg13_month.append(3)
elif Drama_DataFrame.Release_Date[i][:3] == 'Apr':pg13_month.append(4)
elif Drama_DataFrame.Release_Date[i][:3] == 'May':pg13_month.append(5)
elif Drama_DataFrame.Release_Date[i][:3] == 'Jun':pg13_month.append(6)
elif Drama_DataFrame.Release_Date[i][:3] == 'Jul':pg13_month.append(7)
elif Drama_DataFrame.Release_Date[i][:3] == 'Aug':pg13_month.append(8)
elif Drama_DataFrame.Release_Date[i][:3] == 'Sep':pg13_month.append(9)
elif Drama_DataFrame.Release_Date[i][:3] == 'Oct':pg13_month.append(10)
elif Drama_DataFrame.Release_Date[i][:3] == 'Nov':pg13_month.append(11)
elif Drama_DataFrame.Release_Date[i][:3] == 'Dec':pg13_month.append(12)
else:pg13_month.append('Nan')
Showing the 'pg13_month' list.
print(pg13_month)
[10, 12, 9, 11, 11, 12, 10, 11, 11, 7, 10, 12, 4, 11, 1, 12, 12, 5, 4, 7, 2, 4, 2, 10, 8, 2, 4, 8, 4, 2, 12, 6, 9, 11, 4, 3, 2, 3, 2, 12, 8, 10, 9, 1, 7, 8, 11, 5, 4, 12, 10, 1, 1, 4, 9, 7, 1, 3, 12, 5, 9, 11, 10, 7]
Getting the month NC-17 rated Drama Movies were released from the 'Drama_DataFrame' dataframe and labeling them from 1-12 going from Janurary-December.
nc17_month = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='NC-17'and Drama_DataFrame.Profit[i] >= 0:
if Drama_DataFrame.Release_Date[i][:3] == 'Jan':nc17_month.append(1)
elif Drama_DataFrame.Release_Date[i][:3] == 'Feb':nc17_month.append(2)
elif Drama_DataFrame.Release_Date[i][:3] == 'Mar':nc17_month.append(3)
elif Drama_DataFrame.Release_Date[i][:3] == 'Apr':nc17_month.append(4)
elif Drama_DataFrame.Release_Date[i][:3] == 'May':nc17_month.append(5)
elif Drama_DataFrame.Release_Date[i][:3] == 'Jun':nc17_month.append(6)
elif Drama_DataFrame.Release_Date[i][:3] == 'Jul':nc17_month.append(7)
elif Drama_DataFrame.Release_Date[i][:3] == 'Aug':nc17_month.append(8)
elif Drama_DataFrame.Release_Date[i][:3] == 'Sep':nc17_month.append(9)
elif Drama_DataFrame.Release_Date[i][:3] == 'Oct':nc17_month.append(10)
elif Drama_DataFrame.Release_Date[i][:3] == 'Nov':nc17_month.append(11)
elif Drama_DataFrame.Release_Date[i][:3] == 'Dec':nc17_month.append(12)
else:nc17_month.append(9)
Showing the 'nc17_month' list.
print(nc17_month)
[12, 3, 10, 4, 4, 9, 3, 2, 10, 1, 10, 2, 12, 10, 12, 9, 4, 9, 9, 10, 12, 7, 8, 10, 11, 6, 7, 5, 1, 3, 9, 10, 10, 4]
Creating the df_month dataframe.
df_month = pd.DataFrame({'Budget':r_cost+pg_cost+g_cost+pg13_cost+nc17_cost,
"Month_Realesed":r_month+pg_month+g_month+pg13_month+nc17_month,
"Revenue":world_int+world_int1+world_int2+world_int3+world_int4,
})
The 'df_month' dataframe. (this dataframe is interactive)
df_month
| Budget | Month_Realesed | Revenue |
|---|---|---|
| Loading... (need help?) |
Creating a 3D scatter plot of the Budget, Month Realesed and Revenue of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimatio' libary to create a 3d scatter plot animate object
def init():
# Plot the surface
ax.scatter(df['Budget'],df['Opening_Weekend'],df['Profit'],alpha=0.5, s=50,color='red')
#ax.plot_surface(x_surf,y_surf,fittedY, alpha=0.4 ,rstride=1, cstride=1)
return fig
def animate(i):
# azimuth angle : 0 deg to 360 deg
ax.view_init(elev=10, azim=i*4)
return fig
def func(num, dataSet, line):
# NOTE: there is no .set_data() for 3 dim data...
sscatter.set_data(dataSet[0:2, :num])
sscatter.set_3d_properties(dataSet[2, :num])
return sscatter
fig = plt.figure(figsize=(20, 15))
fig = plt.figure()
#ax1 = fig.add_subplot(131, projection='3d')
#ax2 = fig.add_subplot(132, projection='3d')
#ax3 = fig.add_subplot(133, projection='3d')
#ax4 = fig.add_subplot(111, projection='3d')
ax = Axes3D(fig)
#axes = [ax1, ax2, ax3]
# Creating plot
#for ax in axes:
cluster = ax.scatter(df['Budget'],df['Month_Realesed'],df['Revenue'], alpha=0.5,s=50, color='#C41E3A')
#ax1.plot_surface(x_surf,y_surf,fittedY, alpha=0.4, rstride=1, cstride=1,color='brown')
cluster = ax.set_xlabel('Budget')
cluster = ax.set_ylabel('Month_Realesed')
cluster = ax.set_zlabel('Revenue')
ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
ani
C:\Users\rutho\AppData\Local\Temp/ipykernel_24224/4266163597.py:26: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6. This is consistent with other Axes classes. ax = Axes3D(fig)
<matplotlib.animation.FuncAnimation at 0x29921e0aa30>
<Figure size 1440x1080 with 0 Axes>
Saving the animated 3D scatter plot gif as 'drama2.gif'.
#f = r"c://Users/xxDownloads/Project%201/Ani.gif"
writergif = animation.PillowWriter(fps=30)
#ani.save(f, writer=writergif)
ani.save('drama2.gif', fps=10)
#ani.save('first44.gif')
MovieWriter ffmpeg unavailable; using Pillow instead.
The second 3D Scatter Plot (part A): the x-axis is the 'Budegt', the y-axis is the 'Month_Realesed' and the z-axis is the 'Revenue'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into k clusters in which each observation belongs to the cluster with the nearest mean. These clusters will then be analyzed by observing the amount of Budegt spent and Revenue generated per cluster.
Getting the Sum of Square Error (SSE) of the Budget, Month Released and Revenue of the movies that are in the Drama Genre from the 'Drama_DataFrame' dataframe to determine the optimal clusters.
k_rng = range(1, 10)
sse = []
for k in k_rng:
km = KMeans(n_clusters = k)
km.fit(df[['Budget','Month_Realesed','Revenue']])
sse.append(km.inertia_)
C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn(
Showing the 'sse' list.
sse
[3.9701565672727327e+18, 1.3374768179929006e+18, 6.927203948051537e+17, 4.33795586558498e+17, 3.111289467424199e+17, 1.974580280900467e+17, 1.5998318293649258e+17, 1.3456461461306184e+17, 1.1529437847578322e+17]
Plotting the Sum of Square Error (SSE) to determine the optimal clusters for the movies in the Drama Genre from the 'Drama_DataFrame' dataframe using the elbow method. By using the elbow method below, it shows that the optimal clusters is two.
plt.xlabel('x')
plt.ylabel('Sum of Squared Error')
plt.plot(k_rng,sse)
[<matplotlib.lines.Line2D at 0x29922085e80>]
Creating the cluster list.
cluster = []
for i in df_month.Month_Realesed:
if i in [1,2,3,4,5,6]: cluster.append(0)
elif i in [7,8,9,10,11,12]: cluster.append(1)
Adding the cluster list to the 'df_month' dataframe.
df_month['cluster'] = cluster
The updated 'df_month' dataframe. (this dataframe is interactive)
df_month
| Budget | Month_Realesed | Revenue | cluster |
|---|---|---|---|
| Loading... (need help?) |
Creating a 3D scatter plot of the Budget, Month Realesed and Revenue of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimatio' libary to create a 3d scatter plot animate object
fig = plt.figure(figsize=(20, 15))
fig = plt.figure()
#ax1 = fig.add_subplot(131, projection='3d')
#ax2 = fig.add_subplot(132, projection='3d')
#ax3 = fig.add_subplot(133, projection='3d')
#ax4 = fig.add_subplot(111, projection='3d')
ax = Axes3D(fig)
df1 = df[df.cluster==0]
df2 = df[df.cluster==1]
#ax1 = fig.add_subplot(131, projection='3d')
scatter = ax.scatter(df1['Budget'],df1['Month_Realesed'],df1['Revenue'], alpha=0.5,s=50, color='#C41E3A')
scatter = ax.scatter(df2['Budget'],df2['Month_Realesed'],df2['Revenue'], alpha=0.5,s=50, color='#702963')
scatter = ax.set_xlabel('Budget')
scatter = ax.set_ylabel('Month_Realesed')
scatter = ax.set_zlabel('Revenue')
ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
ani
C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/1808098869.py:8: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6. This is consistent with other Axes classes. ax = Axes3D(fig)
<matplotlib.animation.FuncAnimation at 0x2bfc50eb550>
<Figure size 1440x1080 with 0 Axes>
Saving the animated 3D scatter plot gif as 'drama3.gif'.
#f = r"c://Users/xxDownloads/Project%201/Ani.gif"
writergif = animation.PillowWriter(fps=30)
#ani.save(f, writer=writergif)
ani.save('drama3.gif', fps=10)
#ani.save('first44.gif')
MovieWriter ffmpeg unavailable; using Pillow instead.
The second 3D Scatter Plot (part B): the x-axis is the 'Budegt', the y-axis is the 'Month_Realesed' and the z-axis is the 'Revenue'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into k clusters in which each observation belongs to the cluster with the nearest mean. These clusters will then be analyzed by observing the amount of Budegt spent and Revenue generated per cluster.
Getting the index of all the movies that are in the Drama Genre that where realesed from Janurary to June, from the 'df_month' dataframe.
cluster_a_index = []
for i,x in enumerate(df_month.Month_Realesed):
if x == 1:cluster_a_index.append(i)
if x == 2:cluster_a_index.append(i)
if x == 3:cluster_a_index.append(i)
if x == 4:cluster_a_index.append(i)
if x == 5:cluster_a_index.append(i)
if x == 6:cluster_a_index.append(i)
print(cluster_a_index)#showing the cluster_a_index list
[2, 3, 4, 7, 12, 17, 21, 23, 24, 27, 29, 32, 35, 36, 37, 40, 43, 46, 47, 50, 53, 59, 61, 63, 64, 70, 73, 74, 75, 78, 79, 80, 83, 84, 85, 93, 95, 96, 99, 100, 101, 102, 104, 109, 111, 115, 116, 117, 119, 122, 126, 139, 141, 144, 145, 147, 148, 149, 152, 153, 155, 156, 158, 161, 162, 163, 164, 165, 170, 174, 175, 178, 179, 180, 183, 184, 186, 192, 194, 195, 197, 198, 200, 202, 207, 216, 218, 219, 220, 224]
Checking the number of elements in the 'cluster_a_index' list.
len(cluster_a_index)
90
Using the indexes from the 'cluster_a_index' list to get the Month_Realesed, Revenue and Budget of each movie that was realesed from Janurary to June.
month_a = []
rev_a = []
budg_a = []
for i in cluster_a_index:
month_a.append(df_month['Month_Realesed'][i])
rev_a.append(df_month['Revenue'][i])
budg_a.append(df_month['Budget'][i])
Showing the 'month_a' list.
print(month_a)
[5, 2, 2, 2, 4, 4, 3, 1, 6, 1, 1, 2, 5, 3, 6, 4, 3, 5, 4, 5, 1, 3, 2, 6, 3, 1, 6, 4, 2, 3, 3, 1, 5, 1, 2, 2, 6, 5, 3, 2, 5, 4, 3, 6, 5, 3, 4, 4, 6, 3, 5, 4, 1, 5, 4, 2, 4, 2, 2, 4, 4, 2, 6, 4, 3, 2, 3, 2, 1, 5, 4, 1, 1, 4, 1, 3, 5, 3, 4, 4, 3, 2, 1, 2, 4, 6, 5, 1, 3, 4]
Showing the 'rev_a' list.
print(rev_a)
[84154026, 381398492, 371350619, 570998101, 31054727, 38358392, 12034913, 56178935, 70133905, 10765283, 17536004, 40454520, 23251930, 16610760, 16131551, 2088390, 14244931, 1156309, 429448, 77211836, 3256082, 92678948, 12231500, 46918287, 542351353, 47494916, 114830111, 18948425, 137587063, 89137047, 64667874, 106269971, 13835130, 134582776, 6101815, 119285432, 14923752, 125052686, 8443124, 80008942, 48000000, 2411143, 80693537, 325500000, 3750000, 80491516, 311281000, 286214195, 986214868, 47707417, 12000000, 116809717, 97143987, 61721826, 63802928, 197618160, 68984536, 94050951, 142033509, 96633833, 29847480, 82917283, 208265198, 334522294, 38028230, 52545707, 56506120, 128955898, 32909437, 61603136, 31556959, 21971021, 31187727, 36964656, 41699612, 18945682, 15298355, 17356268, 277845, 1614784, 98410061, 15121165, 20412841, 15307113, 3822241, 9000000, 101173038, 36147711, 413802, 1470809]
Showing the 'budg_a' list.
print(budg_a)
[60000000, 55000000, 55000000, 40000000, 22500000, 13000000, 12000000, 11000000, 10000000, 7000000, 4900000, 3500000, 3000000, 2000000, 2000000, 2000000, 1500000, 1000000, 135000, 8500000, 2700000, 20000000, 1700000, 10000000, 95000000, 11800000, 40000000, 8000000, 17000000, 20000000, 2000000, 23000000, 10000000, 16000000, 3000000, 15000000, 7500000, 17000000, 4500000, 8200000, 28000000, 700000, 22000000, 70000000, 2500000, 22000000, 18000000, 8200000, 45000000, 10000000, 1700000, 38000000, 37000000, 35000000, 34000000, 30000000, 30000000, 28000000, 25000000, 25000000, 25000000, 25000000, 20000000, 17000000, 17000000, 16000000, 16000000, 15000000, 12000000, 10000000, 10000000, 9000000, 7400000, 7000000, 5000000, 5000000, 2600000, 12500000, 20000, 955472, 9000000, 15000000, 6500000, 15000000, 3565572, 1000000, 6500000, 1250000, 12000, 612072]
Showing the Frequency of the Repeated Months of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed from Janurary to June . Which will be stored in a dictionary called 'grouped_month_a'.
grouped_month_a= Counter(month_a)
print(grouped_month_a)#showing the grouped_month_a dictionary
Counter({4: 20, 2: 17, 3: 17, 1: 14, 5: 13, 6: 9})
Showing the Frequency of the Repeated Values of the expenses spent by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from Janurary to June. Which will be stored in a dictionary called 'budg_a'.
print(Counter(budg_a))
Counter({10000000: 6, 2000000: 4, 17000000: 4, 15000000: 4, 25000000: 4, 20000000: 3, 16000000: 3, 55000000: 2, 40000000: 2, 12000000: 2, 7000000: 2, 3000000: 2, 1000000: 2, 1700000: 2, 8200000: 2, 28000000: 2, 22000000: 2, 30000000: 2, 9000000: 2, 5000000: 2, 6500000: 2, 60000000: 1, 22500000: 1, 13000000: 1, 11000000: 1, 4900000: 1, 3500000: 1, 1500000: 1, 135000: 1, 8500000: 1, 2700000: 1, 95000000: 1, 11800000: 1, 8000000: 1, 23000000: 1, 7500000: 1, 4500000: 1, 700000: 1, 70000000: 1, 2500000: 1, 18000000: 1, 45000000: 1, 38000000: 1, 37000000: 1, 35000000: 1, 34000000: 1, 7400000: 1, 2600000: 1, 12500000: 1, 20000: 1, 955472: 1, 3565572: 1, 1250000: 1, 12000: 1, 612072: 1})
Getting the minimum budget spent by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from Janurary to June.
min(budg_a)
12000
Creating a function called 'Average' that gets the average value of a list of values.
def Average(lst):
return sum(lst) / len(lst)
Getting the average budegt of Drama movies from the 'Drama_DataFrame' datafrme that were realsed from Janurary to June.
average_budg_a = Average(budg_a)
average_budg_a #16,082,779
16082779.066666666
Getting the index of Drama movies from the 'Drama_DataFrame' dataframe that have a budget of $100,000 to $20 Million.
group_one_index = []
for i in cluster_a_index:
if 0 <= df_month['Budget'][i] <= 20000000:group_one_index.append(i)
print(group_one_index)#showing the group_one_index list
[17, 21, 23, 24, 27, 29, 32, 35, 36, 37, 40, 43, 46, 47, 50, 53, 59, 61, 63, 70, 74, 75, 78, 79, 83, 84, 85, 93, 95, 96, 99, 100, 102, 111, 116, 117, 122, 126, 158, 161, 162, 163, 164, 165, 170, 174, 175, 178, 179, 180, 183, 184, 186, 192, 194, 195, 197, 198, 200, 202, 207, 216, 218, 219, 220, 224]
Checking the number of elements in the 'group_one_index' list.
len(group_one_index)
66
Getting the index of Drama movies from the 'Drama_DataFrame' dataframe that have a budget that is greater than $21 Million .
group_two_index = []
for i in cluster_a_index:
if 20000001 <= df_month['Budget'][i] :group_two_index.append(i)
print(group_two_index)#showing the group_two_index list
[2, 3, 4, 7, 12, 64, 73, 80, 101, 104, 109, 115, 119, 139, 141, 144, 145, 147, 148, 149, 152, 153, 155, 156]
Checking the number of elements in the 'group_two_index' list.
len(group_two_index)
24
Creating a function called 'round_to_multiple that rounds a value to the nearest value that is chosen.
def round_to_multiple(number, multiple):
return multiple * round(number / multiple)
Using the 'round_to_multiple' function to round the Revenue of Drama movies to the nearest 10 million, that were realesed in the months of Janurary to June that has a budget of $100,000 to $20 Million.
rev_a_one = []
for i in group_one_index:rev_a_one.append(round_to_multiple(df_month['Revenue'][i],10000000))
print(rev_a_one)#showing the rev_a_one list
[40000000, 10000000, 60000000, 70000000, 10000000, 20000000, 40000000, 20000000, 20000000, 20000000, 0, 10000000, 0, 0, 80000000, 0, 90000000, 10000000, 50000000, 50000000, 20000000, 140000000, 90000000, 60000000, 10000000, 130000000, 10000000, 120000000, 10000000, 130000000, 10000000, 80000000, 0, 0, 310000000, 290000000, 50000000, 10000000, 210000000, 330000000, 40000000, 50000000, 60000000, 130000000, 30000000, 60000000, 30000000, 20000000, 30000000, 40000000, 40000000, 20000000, 20000000, 20000000, 0, 0, 100000000, 20000000, 20000000, 20000000, 0, 10000000, 100000000, 40000000, 0, 0]
Showing the Frequency of the Repeated Values of the revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed with in the months Janurary to June with a budget of $100,000 to $20 Million.
print(Counter(rev_a_one))
Counter({20000000: 12, 0: 11, 10000000: 10, 40000000: 6, 60000000: 4, 50000000: 4, 130000000: 3, 30000000: 3, 80000000: 2, 90000000: 2, 100000000: 2, 70000000: 1, 140000000: 1, 120000000: 1, 310000000: 1, 290000000: 1, 210000000: 1, 330000000: 1})
Getting the revenue generated of Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the months of Janurary to June with a budget of $100,000 to $20 Million.
group_one = []
for i in group_one_index:group_one.append(df_month['Revenue'][i])
print(group_one)#showing the group_one list
[38358392, 12034913, 56178935, 70133905, 10765283, 17536004, 40454520, 23251930, 16610760, 16131551, 2088390, 14244931, 1156309, 429448, 77211836, 3256082, 92678948, 12231500, 46918287, 47494916, 18948425, 137587063, 89137047, 64667874, 13835130, 134582776, 6101815, 119285432, 14923752, 125052686, 8443124, 80008942, 2411143, 3750000, 311281000, 286214195, 47707417, 12000000, 208265198, 334522294, 38028230, 52545707, 56506120, 128955898, 32909437, 61603136, 31556959, 21971021, 31187727, 36964656, 41699612, 18945682, 15298355, 17356268, 277845, 1614784, 98410061, 15121165, 20412841, 15307113, 3822241, 9000000, 101173038, 36147711, 413802, 1470809]
Getting the minimum revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from Janurary to June with a budget of $100,000 to $20 Million.
min(group_one)
277845
# 450,000,000 - 850,000,000 (#4)(16%)
stor1 = []
for i in group_one:
if 200000<=i<=10000000:stor1.append(i)
print(stor1)#showing the stor1 list
[2088390, 1156309, 429448, 3256082, 6101815, 8443124, 2411143, 3750000, 277845, 1614784, 3822241, 9000000, 413802, 1470809]
# 100,000,000 - 150,000,000 (#10)(24%)
stor2 = []
for i in group_one:
if 10000001<=i<=50000000:stor2.append(i)
print(stor2)#showing the stor2 list
[38358392, 12034913, 10765283, 17536004, 40454520, 23251930, 16610760, 16131551, 14244931, 12231500, 46918287, 47494916, 18948425, 13835130, 14923752, 47707417, 12000000, 38028230, 32909437, 31556959, 21971021, 31187727, 36964656, 41699612, 18945682, 15298355, 17356268, 15121165, 20412841, 15307113, 36147711]
# 150,000,000 - 250,000,000 (#10)(24%)
stor3 = []
for i in group_one:
if 50000001<=i<=100000000:stor3.append(i)
print(stor3)#showing the stor3 list
[56178935, 70133905, 77211836, 92678948, 89137047, 64667874, 80008942, 52545707, 56506120, 61603136, 98410061]
# 450,000,000 - 850,000,000 (#4)(16%)
stor4 = []
for i in group_one:
if 100000001<=i<=200000000:stor4.append(i)
print(stor4)#showing the stor3 list
[137587063, 134582776, 119285432, 125052686, 128955898, 101173038]
# 250,000,000 - 350,000,000 (#6)(15%)
stor5 = []
for i in group_one:
if 200000001<=i<=300000000:stor5.append(i)
print(stor5)#showing the stor3 list
[286214195, 208265198]
# 350,000,000 - 500,000,000 (#7)(17%)
stor6 = []
for i in group_one:
if 300000001<=i<=400000000:stor6.append(i)
print(stor6)#showing the stor6 list
[311281000, 334522294]
Using the 'round_to_multiple' function to round the Revenue of Drama movies to the nearest 50 million, that were realesed in the months of Janurary to June that has a budget greater than $21 Million.
rev_a_two = []
for i in group_two_index:rev_a_two.append(round_to_multiple(df_month['Revenue'][i],50000000))
print(rev_a_two)#showing the rev_a_two list
[100000000, 400000000, 350000000, 550000000, 50000000, 550000000, 100000000, 100000000, 50000000, 100000000, 350000000, 100000000, 1000000000, 100000000, 100000000, 50000000, 50000000, 200000000, 50000000, 100000000, 150000000, 100000000, 50000000, 100000000]
Showing the Frequency of the Repeated Values of the revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from Janurary to June with a budget greater than $21 Million.
print(Counter(rev_a_two))
Counter({100000000: 10, 50000000: 6, 350000000: 2, 550000000: 2, 400000000: 1, 1000000000: 1, 200000000: 1, 150000000: 1})
Getting the revenue generated of Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the months of Janurary to June with a budget greater than $21 Million.
group_two = []
for i in group_two_index:group_two.append(df_month['Revenue'][i])
print(group_two)#showing the group_two list
[84154026, 381398492, 371350619, 570998101, 31054727, 542351353, 114830111, 106269971, 48000000, 80693537, 325500000, 80491516, 986214868, 116809717, 97143987, 61721826, 63802928, 197618160, 68984536, 94050951, 142033509, 96633833, 29847480, 82917283]
Getting the maximum revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from Janurary to June with a budget greater than $21 Million.
max(group_two)
986214868
#100,000,000-150,000,000 (#3)(6%)
stor7 = []
for i in group_two:
if 20000000 <= i<=100000000:stor7.append(i)
print(stor7)#showing the stor7 list
[84154026, 31054727, 48000000, 80693537, 80491516, 97143987, 61721826, 63802928, 68984536, 94050951, 96633833, 29847480, 82917283]
#150,000,000-200,000,000 (#2)(4%)
stor8 = []
for i in group_two:
if 100000001 <= i<=200000000:stor8.append(i)
print(stor8)#showing the stor8 list
[114830111, 106269971, 116809717, 197618160, 142033509]
#200,000,000-250,000,000 (#9)(18%)
stor9 = []
for i in group_two:
if 200000001 <=i<=400000000:stor9.append(i)
print(stor9)#showing the stor9 list
[381398492, 371350619, 325500000]
#250,000,000-350,000,000 (#4)(8%)
stor10 = []
for i in group_two:
if 400000001 <=i<=600000000:stor10.append(i)
print(stor10)#showing the stor10 list
[570998101, 542351353]
#350,000,000-450,000,000 (#8)(16%)
stor11 = []
for i in group_two:
if 900000001 <= i<=1000000000:stor11.append(i)
print(stor11)#showing the stor11 list
[986214868]
Getting the index of all the movies that are in the Drama Genre that where realesed from July to December, from the 'df_month' dataframe.
cluster_b_index = []
for i,x in enumerate(df_month.Month_Realesed):
if x == 7:cluster_b_index.append(i)
if x == 8:cluster_b_index.append(i)
if x == 9:cluster_b_index.append(i)
if x == 10:cluster_b_index.append(i)
if x == 11:cluster_b_index.append(i)
if x == 12:cluster_b_index.append(i)
print(cluster_b_index)#showing the cluster_b_index list
[0, 1, 5, 6, 8, 9, 10, 11, 13, 14, 15, 16, 18, 19, 20, 22, 25, 26, 28, 30, 31, 33, 34, 38, 39, 41, 42, 44, 45, 48, 49, 51, 52, 54, 55, 56, 57, 58, 60, 62, 65, 66, 67, 68, 69, 71, 72, 76, 77, 81, 82, 86, 87, 88, 89, 90, 91, 92, 94, 97, 98, 103, 105, 106, 107, 108, 110, 112, 113, 114, 118, 120, 121, 123, 124, 125, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 140, 142, 143, 146, 150, 151, 154, 157, 159, 160, 166, 167, 168, 169, 171, 172, 173, 176, 177, 181, 182, 185, 187, 188, 189, 190, 191, 193, 196, 199, 201, 203, 204, 205, 206, 208, 209, 210, 211, 212, 213, 214, 215, 217, 221, 222, 223]
Checking the number of elements in the 'cluster_b_index' list.
len(cluster_b_index)
135
Using the indexes from the 'cluster_b_index' list to get the Month_Realesed, Revenue and Budget of each movie that was realesed from July to December.
month_b = []
rev_b = []
budg_b = []
for i in cluster_b_index:
month_b.append(df_month['Month_Realesed'][i])
rev_b.append(df_month['Revenue'][i])
budg_b.append(df_month['Budget'][i])
Showing the 'month_b' list.
print(month_b)
[12, 10, 10, 12, 9, 11, 10, 11, 11, 8, 10, 12, 10, 12, 9, 11, 11, 11, 10, 9, 7, 10, 10, 10, 8, 10, 9, 12, 10, 7, 9, 7, 12, 10, 9, 11, 9, 11, 8, 10, 8, 11, 12, 8, 12, 10, 10, 11, 9, 7, 7, 11, 10, 12, 10, 7, 9, 12, 12, 7, 7, 11, 11, 8, 7, 10, 8, 12, 10, 7, 12, 8, 12, 11, 10, 9, 10, 12, 9, 11, 11, 12, 10, 11, 11, 7, 10, 12, 11, 12, 12, 7, 10, 8, 8, 12, 9, 11, 12, 8, 10, 9, 7, 8, 11, 12, 10, 9, 7, 12, 9, 11, 10, 7, 12, 10, 9, 10, 10, 12, 10, 12, 9, 9, 9, 10, 12, 7, 8, 10, 11, 7, 9, 10, 10]
Showing the 'rev_b' list.
print(rev_b)
[449948323, 368567189, 74966854, 134612435, 50647416, 160558438, 77735925, 32398681, 38017873, 46604054, 28270399, 331266710, 36262783, 19859167, 35830713, 42843521, 21817298, 77733867, 17499242, 4972016, 57273049, 20433227, 38969037, 11295324, 10153415, 6328516, 21270290, 16566240, 5438911, 2769782, 54766923, 34718173, 1951683, 13000000, 11000000, 180047784, 96068724, 304604712, 73975239, 9709597, 73986904, 305937718, 216601214, 38102988, 27118000, 19344615, 38741732, 64605762, 33473297, 152036382, 171120329, 63954968, 15164458, 127956187, 43440294, 17815212, 157297525, 35856053, 40716963, 549368315, 64892670, 18587135, 438656843, 66947950, 27469621, 37799643, 246100000, 4517000, 143985708, 17657973, 90482317, 268000000, 72071636, 30194409, 65500000, 7600377, 693698673, 634454789, 137551594, 90552675, 213591522, 179748880, 108660270, 71004627, 203127894, 48478084, 162498338, 169590606, 173567581, 85309093, 252276928, 165552290, 41059418, 213120004, 66540205, 64282881, 22281732, 76086711, 20601987, 59168692, 34044909, 33069303, 23477345, 78356170, 62076141, 36787044, 81831866, 16369708, 148806510, 6205034, 35185884, 5552584, 3728400, 2102779, 20412841, 1008404, 20412216, 67091915, 19465835, 20412841, 19465835, 16566240, 2315026, 213120004, 65167430, 2661944, 20412841, 3453416, 50283563, 3894240, 2038916, 20412216, 65167430, 5746453, 1008404]
Showing the 'budg_b' list.
print(budg_b)
[100000000, 61000000, 55000000, 52500000, 37500000, 31000000, 23000000, 22500000, 21000000, 20000000, 20000000, 13000000, 13000000, 12000000, 12000000, 11800000, 9400000, 8500000, 5000000, 4750000, 4000000, 3400000, 3300000, 2000000, 2000000, 2000000, 1987650, 1000000, 1000000, 100000, 6000000, 20000000, 100000, 11500000, 9000000, 180000000, 37000000, 20000000, 3000000, 5100000, 3000000, 20000000, 40000000, 5000000, 422000, 15000000, 32000000, 30000000, 500000, 32000000, 90000000, 15000000, 10000000, 20000000, 12000000, 5000000, 7000000, 14000000, 12000000, 5000000, 22000000, 7000000, 20000000, 23000000, 15000000, 2700000, 30000000, 666000, 85000000, 10000000, 60000000, 858000, 17000000, 6400000, 13000000, 1750000, 110000000, 75000000, 60000000, 55000000, 50000000, 50000000, 50000000, 49000000, 47000000, 44000000, 40000000, 40000000, 37000000, 36000000, 35000000, 33000000, 26000000, 25000000, 25000000, 24000000, 20000000, 19000000, 15000000, 15000000, 14000000, 13000000, 12000000, 11000000, 11000000, 9700000, 9000000, 6000000, 5000000, 5000000, 2000000, 1400000, 250000, 175000, 6500000, 1000000, 1500000, 15000000, 4000000, 6500000, 4074940, 1000000, 1000000, 12000000, 15000000, 350000, 6500000, 904765, 34000000, 230000, 1000000, 1500000, 15000000, 2200000, 50000]
Showing the Frequency of the Repeated Months of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December . Which will be stored in a dictionary called 'grouped_month_b'.
grouped_month_b = Counter(month_b)
print(grouped_month_b)#showing the grouped_month_b list
Counter({10: 35, 12: 27, 11: 23, 9: 20, 7: 17, 8: 13})
Getting the minimum budget spent by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December.
min(budg_b)
50000
Getting the average budegt of Drama movies from the 'Drama_DataFrame' dataframe that were realsed from July to December.
average_budg_b= Average(budg_b)
average_budg_b #202,851,724
20651598.22033898
Using the 'round_to_multiple' function to round the Budget of Drama movies to the nearest 100,000, that were realesed in the months of July to December. .
bud_b = []
for i in cluster_b_index:bud_b.append(round_to_multiple(df_month['Budget'][i],100000))
print(bud_b)#showing the bud_b list
[100000000, 61000000, 55000000, 52500000, 37500000, 31000000, 23000000, 22500000, 21000000, 20000000, 20000000, 13000000, 13000000, 12000000, 12000000, 11800000, 9400000, 8500000, 5000000, 4800000, 3400000, 3300000, 2000000, 2000000, 2000000, 2000000, 1000000, 1000000, 6000000, 100000, 11500000, 9000000, 180000000, 37000000, 20000000, 3000000, 5100000, 3000000, 20000000, 40000000, 5000000, 400000, 15000000, 32000000, 30000000, 500000, 15000000, 10000000, 20000000, 12000000, 7000000, 14000000, 12000000, 7000000, 20000000, 23000000, 2700000, 30000000, 700000, 85000000, 60000000, 900000, 17000000, 6400000, 13000000, 1800000, 110000000, 75000000, 60000000, 55000000, 50000000, 50000000, 50000000, 49000000, 47000000, 40000000, 40000000, 37000000, 36000000, 35000000, 26000000, 25000000, 25000000, 24000000, 20000000, 19000000, 15000000, 15000000, 14000000, 13000000, 11000000, 11000000, 9700000, 9000000, 6000000, 5000000, 2000000, 1400000, 200000, 6500000, 1000000, 1500000, 15000000, 4000000, 6500000, 4100000, 1000000, 1000000, 12000000, 15000000, 400000, 6500000, 34000000, 200000, 1000000, 15000000, 2200000, 0]
Showing the Frequency of the Repeated Values of the budget spent by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December .
print(Counter(bud_b))
Counter({20000000: 7, 15000000: 7, 1000000: 6, 12000000: 5, 2000000: 5, 13000000: 4, 5000000: 3, 40000000: 3, 50000000: 3, 6500000: 3, 55000000: 2, 23000000: 2, 6000000: 2, 9000000: 2, 37000000: 2, 3000000: 2, 400000: 2, 30000000: 2, 7000000: 2, 14000000: 2, 60000000: 2, 25000000: 2, 11000000: 2, 200000: 2, 100000000: 1, 61000000: 1, 52500000: 1, 37500000: 1, 31000000: 1, 22500000: 1, 21000000: 1, 11800000: 1, 9400000: 1, 8500000: 1, 4800000: 1, 3400000: 1, 3300000: 1, 100000: 1, 11500000: 1, 180000000: 1, 5100000: 1, 32000000: 1, 500000: 1, 10000000: 1, 2700000: 1, 700000: 1, 85000000: 1, 900000: 1, 17000000: 1, 6400000: 1, 1800000: 1, 110000000: 1, 75000000: 1, 49000000: 1, 47000000: 1, 36000000: 1, 35000000: 1, 26000000: 1, 24000000: 1, 19000000: 1, 9700000: 1, 1400000: 1, 1500000: 1, 4000000: 1, 4100000: 1, 34000000: 1, 2200000: 1, 0: 1})
Using the 'round_to_multiple' function to round the Revenue of Drama movies to the nearest 50 Million, that were realesed in the months of July to December. .
rev_b = []
for i in cluster_b_index:rev_b.append(round_to_multiple(df_month['Revenue'][i],50000000))
print(rev_b)#showing the bud_b list
[450000000, 350000000, 50000000, 150000000, 50000000, 150000000, 100000000, 50000000, 50000000, 50000000, 50000000, 350000000, 50000000, 0, 50000000, 50000000, 0, 100000000, 0, 0, 0, 50000000, 0, 0, 0, 0, 0, 0, 50000000, 0, 0, 0, 200000000, 100000000, 300000000, 50000000, 0, 50000000, 300000000, 200000000, 50000000, 50000000, 0, 50000000, 50000000, 50000000, 50000000, 0, 150000000, 50000000, 150000000, 50000000, 50000000, 0, 450000000, 50000000, 50000000, 250000000, 0, 150000000, 100000000, 250000000, 50000000, 50000000, 50000000, 0, 700000000, 650000000, 150000000, 100000000, 200000000, 200000000, 100000000, 50000000, 200000000, 150000000, 150000000, 150000000, 100000000, 250000000, 50000000, 200000000, 50000000, 50000000, 0, 100000000, 0, 50000000, 50000000, 50000000, 100000000, 50000000, 50000000, 100000000, 0, 0, 50000000, 0, 0, 0, 0, 0, 50000000, 0, 0, 0, 0, 0, 200000000, 50000000, 0, 0, 50000000, 0, 0, 50000000, 0, 0]
Showing the Frequency of the Repeated Values of the revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December .
print(Counter(rev_b))
Counter({50000000: 41, 0: 40, 100000000: 10, 150000000: 9, 200000000: 7, 250000000: 3, 450000000: 2, 350000000: 2, 300000000: 2, 700000000: 1, 650000000: 1})
Getting the index of Drama movies from the 'Drama_DataFrame' dataframe that have a budget that is less than or equal to $20 Million.
group_one_index = []
for i in cluster_b_index:
if 40000 <= df_month['Budget'][i] <= 20000000:group_one_index.append(i)
print(group_one_index)#showing the group_one_index list
[14, 15, 16, 18, 19, 20, 22, 25, 26, 28, 30, 33, 34, 38, 39, 41, 42, 44, 45, 49, 52, 54, 55, 58, 60, 62, 65, 66, 68, 69, 71, 77, 86, 87, 88, 89, 91, 92, 94, 103, 105, 108, 112, 120, 121, 123, 124, 125, 159, 160, 166, 167, 168, 169, 172, 173, 176, 177, 181, 185, 187, 188, 189, 191, 193, 196, 199, 201, 203, 204, 205, 206, 208, 209, 210, 211, 214, 215, 221, 222, 223]
Getting the index of Drama movies from the 'Drama_DataFrame' dataframe that has a budget that is greater than $180 Million.
group_two_index = []
for i in cluster_b_index:
if 20000001<= df_month['Budget'][i] <= 180000000:group_two_index.append(i)
print(group_two_index)#showing the group_one_index list
[0, 1, 5, 6, 8, 9, 10, 11, 13, 56, 57, 67, 72, 76, 106, 110, 113, 118, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 140, 142, 143, 150, 151, 154, 157, 213]
Getting the revenue generated of Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the months of July to December with a budget of $40,000 to $20 Million.
group_one = []
for i in group_one_index:group_one.append(df_month['Revenue'][i])
print(group_one)#showing the group_one list
[46604054, 28270399, 331266710, 36262783, 19859167, 35830713, 42843521, 21817298, 77733867, 17499242, 4972016, 20433227, 38969037, 11295324, 10153415, 6328516, 21270290, 16566240, 5438911, 54766923, 1951683, 13000000, 11000000, 304604712, 73975239, 9709597, 73986904, 305937718, 38102988, 27118000, 19344615, 33473297, 63954968, 15164458, 127956187, 43440294, 157297525, 35856053, 40716963, 18587135, 438656843, 37799643, 4517000, 268000000, 72071636, 30194409, 65500000, 7600377, 22281732, 76086711, 20601987, 59168692, 34044909, 33069303, 78356170, 62076141, 36787044, 81831866, 16369708, 6205034, 35185884, 5552584, 3728400, 20412841, 1008404, 20412216, 67091915, 19465835, 20412841, 19465835, 16566240, 2315026, 213120004, 65167430, 2661944, 20412841, 3894240, 2038916, 65167430, 5746453, 1008404]
Checking the number of elements in the 'group_one' list.
len(group_one)
81
Getting the minimum revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December with a budget of $40,000 to $20 Million.
min(group_one)
1008404
Using the 'round_to_multiple' function to round the Revenue of Drama movies to the nearest 1 Million, that were realesed in the months of July to December with a budget of $40,000 to $20 Million .
rev_b_one = []
for i in group_one_index:rev_b_one.append(round_to_multiple(df_month['Revenue'][i],1000000))
print(rev_b_one)#showing the rev_b_one list
[47000000, 28000000, 331000000, 36000000, 20000000, 36000000, 43000000, 22000000, 78000000, 17000000, 5000000, 20000000, 39000000, 11000000, 10000000, 6000000, 21000000, 17000000, 5000000, 55000000, 2000000, 13000000, 11000000, 305000000, 74000000, 10000000, 74000000, 306000000, 38000000, 27000000, 19000000, 33000000, 64000000, 15000000, 128000000, 43000000, 157000000, 36000000, 41000000, 19000000, 439000000, 38000000, 5000000, 268000000, 72000000, 30000000, 66000000, 8000000, 22000000, 76000000, 21000000, 59000000, 34000000, 33000000, 78000000, 62000000, 37000000, 82000000, 16000000, 6000000, 35000000, 6000000, 4000000, 20000000, 1000000, 20000000, 67000000, 19000000, 20000000, 19000000, 17000000, 2000000, 213000000, 65000000, 3000000, 20000000, 4000000, 2000000, 65000000, 6000000, 1000000]
Showing the Frequency of the Repeated Values of the revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December with a budget of $40,000 to $20 Million.
print(Counter(rev_b_one))
Counter({20000000: 6, 6000000: 4, 19000000: 4, 36000000: 3, 17000000: 3, 5000000: 3, 2000000: 3, 43000000: 2, 22000000: 2, 78000000: 2, 11000000: 2, 10000000: 2, 21000000: 2, 74000000: 2, 38000000: 2, 33000000: 2, 4000000: 2, 1000000: 2, 65000000: 2, 47000000: 1, 28000000: 1, 331000000: 1, 39000000: 1, 55000000: 1, 13000000: 1, 305000000: 1, 306000000: 1, 27000000: 1, 64000000: 1, 15000000: 1, 128000000: 1, 157000000: 1, 41000000: 1, 439000000: 1, 268000000: 1, 72000000: 1, 30000000: 1, 66000000: 1, 8000000: 1, 76000000: 1, 59000000: 1, 34000000: 1, 62000000: 1, 37000000: 1, 82000000: 1, 16000000: 1, 35000000: 1, 67000000: 1, 213000000: 1, 3000000: 1})
# 100,000,000 - 150,000,000 (#4)(8%)
stor12 = []
for i in group_one:
if 900000<=i<=10000000:stor12.append(i)
print(stor12)#showing the stor12 list
[4972016, 6328516, 5438911, 1951683, 9709597, 4517000, 7600377, 6205034, 5552584, 3728400, 1008404, 2315026, 2661944, 3894240, 2038916, 5746453, 1008404]
# 100,000,000 - 150,000,000 (#7)(15%)
stor13 = []
for i in group_one:
if 10000001<=i<=50000000:stor13.append(i)
print(stor13)#showing the stor13 list
[46604054, 28270399, 36262783, 19859167, 35830713, 42843521, 21817298, 17499242, 20433227, 38969037, 11295324, 10153415, 21270290, 16566240, 13000000, 11000000, 38102988, 27118000, 19344615, 33473297, 15164458, 43440294, 35856053, 40716963, 18587135, 37799643, 30194409, 22281732, 20601987, 34044909, 33069303, 36787044, 16369708, 35185884, 20412841, 20412216, 19465835, 20412841, 19465835, 16566240, 20412841]
#150,000,000-250,000,000 (#10)(21%)
stor14 = []
for i in group_one:
if 50000001 <= i<=100000000:stor14.append(i)
print(stor14)#showing the stor14 list
[77733867, 54766923, 73975239, 73986904, 63954968, 72071636, 65500000, 76086711, 59168692, 78356170, 62076141, 81831866, 67091915, 65167430, 65167430]
#150,000,000-250,000,000 (#7)(15%)
stor15 = []
for i in group_one:
if 100000001 <= i<=200000000:stor15.append(i)
print(stor15)#showing the stor15 list
[127956187, 157297525]
#250,000,000-350,000,000 (#7)(15%)
stor16 = []
for i in group_one:
if 200000001 <= i<=300000000:stor16.append(i)
print(stor16)#showing the stor16 list
[268000000, 213120004]
#350,000,000-450,000,000 (#5)(10%)
stor17 = []
for i in group_one:
if 300000001 <= i:stor17.append(i)
print(stor17)#showing the stor17 list
[331266710, 304604712, 305937718, 438656843]
Getting the revenue generated of Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the months of July to December with a budget that is greater than $180 Million.
group_two = []
for i in group_two_index:group_two.append(df_month['Revenue'][i])
print(group_two)#showing the group_two list
[449948323, 368567189, 74966854, 134612435, 50647416, 160558438, 77735925, 32398681, 38017873, 180047784, 96068724, 216601214, 38741732, 64605762, 66947950, 246100000, 143985708, 90482317, 693698673, 634454789, 137551594, 90552675, 213591522, 179748880, 108660270, 71004627, 203127894, 162498338, 169590606, 173567581, 85309093, 252276928, 41059418, 213120004, 66540205, 64282881, 50283563]
Checking the number of elements in the 'group_two' list.
len(group_two)
37
Getting the maximum revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December with a budget greater than $180 Million.
max(group_two)
693698673
Using the 'round_to_multiple' function to round the Revenue of Drama movies to the nearest 1 Million, that were realesed in the months of July to December with a budget that is greater than $180 Million .
rev_b_two = []
for i in group_two_index:rev_b_two.append(round_to_multiple(df_month['Revenue'][i],50000000))
print(rev_b_two)#showing the rev_b_two list
[450000000, 350000000, 50000000, 150000000, 50000000, 150000000, 100000000, 50000000, 50000000, 200000000, 100000000, 200000000, 50000000, 50000000, 50000000, 250000000, 150000000, 100000000, 700000000, 650000000, 150000000, 100000000, 200000000, 200000000, 100000000, 50000000, 200000000, 150000000, 150000000, 150000000, 100000000, 250000000, 50000000, 200000000, 50000000, 50000000, 50000000]
Showing the Frequency of the Repeated Values of the revenue generated by Drama movies from the 'Drama_DataFrame' dataframe that were realesed from July to December with a budget that is greater than $180 Million.
print(Counter(rev_b_two))
Counter({50000000: 12, 150000000: 7, 100000000: 6, 200000000: 6, 250000000: 2, 450000000: 1, 350000000: 1, 700000000: 1, 650000000: 1})
# 100,000,000 - 150,000,000 (#2)(8%)
for i in group_two:
if 100000000<=i<=150000000:print(i)
134612435 143985708 137551594 108660270
#150,000,000-250,000,000 (#4)(17%)
for i in group_two:
if 150000000 <= i<=200000000:print(i)
160558438 180047784 179748880 162498338 169590606 173567581
#150,000,000-250,000,000 (#4)(17%)
for i in group_two:
if 200000000 <= i<=250000000:print(i)
216601214 246100000 213591522 203127894 213120004
#350,000,000-450,000,000 (#4)(17%)
for i in group_two:
if 250000000 <=i<=300000000:print(i)
252276928
#350,000,000-450,000,000 (#4)(17%)
for i in group_two:
if 350000000 <=i<=450000000:print(i)
449948323 368567189
# 450,000,000 - 850,000,000 (#2)(8%)
for i in group_two:
if 550000000<=i<=650000000:print(i)
634454789
# 450,000,000 - 850,000,000 (#6)(25%)
for i in group_two:
if 650000000<=i<=800000000:print(i)
693698673
Assigning the season each R-rated Drama movie was realesed through the month it was realesed.
season_r = []
for i in r_month:
if i in [12,1,2]:season_r.append(1)
if i in [3,4,5]:season_r.append(2)
if i in [6,7,8]:season_r.append(3)
if i in [9,10,11]:season_r.append(4)
print(season_r)#showing the season_r list
[1, 4, 2, 1, 1, 4, 1, 1, 4, 4, 4, 4, 2, 4, 3, 4, 1, 2, 4, 1, 4, 2, 4, 1, 3, 4, 4, 1, 4, 1, 4, 3, 1, 4, 4, 2, 2, 3, 4, 3, 2, 4, 4, 2, 1, 4, 2, 2, 3, 4, 2, 3, 1, 1, 4, 4]
Assigning the season each PG-rated Drama movie was realesed through the month it was realesed.
season_pg = []
for i in pg_month:
if i in [12,1,2]:season_pg.append(1)
if i in [3,4,5]:season_pg.append(2)
if i in [6,7,8]:season_pg.append(3)
if i in [9,10,11]:season_pg.append(4)
print(season_pg)#showing the season_pg list
[4, 4, 4, 2, 3, 1, 4, 3, 2, 3, 4, 1, 3, 1, 1, 4, 4, 3, 2, 1, 4, 4, 2, 2, 1, 3, 3, 2, 1, 1, 4, 4, 1, 4, 3, 4, 1, 1, 1, 3, 2, 3, 3, 2, 1, 2]
Assigning the season each G-rated Drama movie was realesed through the month it was realesed.
season_g = []
for i in g_month:
if i in [12,1,2]:season_g.append(1)
if i in [3,4,5]:season_g.append(2)
if i in [6,7,8]:season_g.append(3)
if i in [9,10,11]:season_g.append(4)
print(season_g)#showing the season_g list
[2, 4, 2, 4, 3, 3, 4, 3, 3, 2, 1, 4, 3, 2, 2, 2, 1, 3, 3, 1, 2, 4, 4, 4, 2]
Assigning the season each PG-13 rated Drama movie was realesed through the month it was realesed.
season_pg13 = []
for i in pg13_month:
if i in [12,1,2]:season_pg13.append(1)
if i in [3,4,5]:season_pg13.append(2)
if i in [6,7,8]:season_pg13.append(3)
if i in [9,10,11]:season_pg13.append(4)
print(season_pg13)#showing the season_pg13 list
[4, 1, 4, 4, 4, 1, 4, 4, 4, 3, 4, 1, 2, 4, 1, 1, 1, 2, 2, 3, 1, 2, 1, 4, 3, 1, 2, 3, 2, 1, 1, 3, 4, 4, 2, 2, 1, 2, 1, 1, 3, 4, 4, 1, 3, 3, 4, 2, 2, 1, 4, 1, 1, 2, 4, 3, 1, 2, 1, 2, 4, 4, 4, 3]
Assigning the season each NC-17 rated Drama movie was realesed through the month it was realesed.
season_nc17 = []
for i in nc17_month:
if i in [12,1,2]:season_nc17.append(1)
if i in [3,4,5]:season_nc17.append(2)
if i in [6,7,8]:season_nc17.append(3)
if i in [9,10,11]:season_nc17.append(4)
print(season_nc17)#showing the season_nc17 list
[1, 2, 4, 2, 2, 4, 2, 1, 4, 1, 4, 1, 1, 4, 1, 4, 2, 4, 4, 4, 1, 3, 3, 4, 4, 3, 3, 2, 1, 2, 4, 4, 4, 2]
Creating the df_season dataframe.
df_season = pd.DataFrame({'Season':season_r+season_pg+season_g+season_pg13+season_nc17,
"Opening_Weekend":r_opening_weekend+pg_opening_weekend
+g_opening_weekend+pg13_opening_weekend+nc17_opening_weekend,
"Profit":profit_int+profit_int1+profit_int2+profit_int3+profit_int4
})
The 'df_season' dataframe. (this dataframe is interactive)
df_season
| Season | Opening_Weekend | Profit |
|---|---|---|
| Loading... (need help?) |
Creating a 3D scatter plot of the Season, Opening Weekend and Profit of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimation' libary to create a 3d scatter plot animate object
def animate(i):
# azimuth angle : 0 deg to 360 deg
ax.view_init(elev=10, azim=i*4)
return fig
#fig = plt.figure(figsize=(5, 5))
fig = plt.figure()
#ax = Axes3D(fig)
#fig, ax = plt.subplots()
ax = Axes3D(fig)
#ax = fig.add_subplot(1, 2, 1, projection='3d')
#fig.subplots_adjust(left=0.125, projection='3d')
#fig.subplots_adjust(bottom = 0.1)
#fig.subplots_adjust(top = 0.9)
#fig.subplots_adjust(right = 0.9)
#fig = plt.figure(figsize=(6,4))
#ax = Axes3D(fig)
cluster = ax.scatter(df['Season'],df['Opening_Weekend'],df['Profit'], alpha=0.5,s=50, color='#ff4500')
cluster = ax.set_xlabel('Season')
cluster = ax.set_ylabel('Opening_Weekend')
cluster = ax.set_zlabel('Profit')
ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
ani
C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/2410550145.py:11: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6. This is consistent with other Axes classes. ax = Axes3D(fig)
<matplotlib.animation.FuncAnimation at 0x2bfc38f8820>
Saving the animated 3D scatter plot gif as 'drama4.gif'.
writergif = animation.PillowWriter(fps=30)
ani.save('drama4.gif', fps=10 )
MovieWriter ffmpeg unavailable; using Pillow instead.
The third 3D Scatter Plot (part A): the x-axis is the 'Seaon', the y-axis is the 'Month Realesed' and the z-axis is the 'Revenue'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into k clusters in which each observation belongs to the cluster with the nearest mean. These clusters are based on seasons, the clusters will then be analyzed by observing the amount of Revenue generated per cluster.
Getting the Sum of Square Error (SSE) of the Season, Opening Weekend and Profit of the movies that are in the Drama Genre from the 'Drama_DataFrame' dataframe to determine the optimal clusters.
k_rng = range(1, 10)
sse = []
for k in k_rng:
km = KMeans(n_clusters = k)
km.fit(df[['Season','Opening_Weekend','Profit']])
sse.append(km.inertia_)
C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn( C:\Users\rutho\AppData\Roaming\Python\Python39\site-packages\sklearn\cluster\_kmeans.py:1334: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn(
Showing the 'sse' list.
sse
[3.2618910215856246e+18, 9.963479636909289e+17, 5.2194459296822835e+17, 3.142278548042057e+17, 1.778114385874955e+17, 1.046883781436286e+17, 8.207687468107851e+16, 6.136436882636602e+16, 4.79187521252093e+16]
Plotting the Sum of Square Error (SSE) to determine the optimal clusters for the movies in the Drama Genre from the 'Drama_DataFrame' dataframe using the elbow method. By using the elbow method below, it shows below that the optimal clusters is two.
plt.xlabel('x')
plt.ylabel('Sum of Squared Error')
plt.plot(k_rng,sse)
[<matplotlib.lines.Line2D at 0x2bfbe35ed30>]
Creating the cluster list.
y_predicted = []
for i in df_season["Season"]:
if i in [1,2]:y_predicted.append(0)
if i in [3,4]:y_predicted.append(1)
Adding the cluster list to the 'df_season' dataframe.
df_season['cluster'] = y_predicted
The updated 'df_season' dataframe. (this dataframe is interactive)
df_season
| Season | Opening_Weekend | Profit | cluster |
|---|---|---|---|
| Loading... (need help?) |
Creating a 3D scatter plot of the Season, Month Realesed and Revenue of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimation' libary to create a 3d scatter plot animate object
def animate(i):
# azimuth angle : 0 deg to 360 deg
ax.view_init(elev=10, azim=i*4)
return fig
fig = plt.figure()
#fig = plt.figure(figsize=(4, 15))
#fig = plt.figure()
ax = Axes3D(fig)
df1 = df[df.cluster==0]
df2 = df[df.cluster==1]
#ax1 = fig.add_subplot(131, projection='3d')
scatter = ax.scatter(df1['Season'],df1['Opening_Weekend'],df1['Profit'], alpha=0.5,s=50, color='#ff4500')
scatter = ax.scatter(df2['Season'],df2['Opening_Weekend'],df2['Profit'], alpha=0.5,s=50, color='#960018')
scatter = ax.set_xlabel('Season')
scatter = ax.set_ylabel('Month_Realesed')
scatter = ax.set_zlabel('Revenue')
ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
ani
C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/2570737283.py:11: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6. This is consistent with other Axes classes. ax = Axes3D(fig)
<matplotlib.animation.FuncAnimation at 0x2bfc3d67430>
Saving the animated 3D scatter plot gif as 'drama5.gif'.
writergif = animation.PillowWriter(fps=30)
ani.save('drama5.gif', fps=10 )
MovieWriter ffmpeg unavailable; using Pillow instead.
The third 3D Scatter Plot (part B): the x-axis is the 'Seaon', the y-axis is the 'Month Realesed' and the z-axis is the 'Revenue'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into k clusters in which each observation belongs to the cluster with the nearest mean. These clusters are based on seasons, the clusters will then be analyzed by observing the amount of Revenue generated per cluster.
Getting the index of all the movies that are in the Drama Genre that where realesed in Winter and Spring, from the 'df_season' dataframe.
cluster_a_index = []
for i,x in enumerate(df_season.cluster):
if x == 0:cluster_a_index.append(i)
print(cluster_a_index)#showing the cluster_a_index list
[0, 2, 3, 4, 6, 7, 12, 16, 17, 19, 21, 23, 27, 29, 32, 35, 36, 40, 43, 44, 46, 47, 50, 52, 53, 59, 61, 64, 67, 69, 70, 74, 75, 78, 79, 80, 83, 84, 85, 88, 92, 93, 94, 96, 99, 100, 101, 102, 104, 111, 112, 115, 116, 117, 118, 121, 122, 126, 128, 132, 138, 139, 141, 142, 143, 144, 145, 147, 148, 149, 152, 153, 155, 156, 157, 161, 162, 163, 164, 165, 166, 170, 174, 175, 176, 178, 179, 180, 183, 184, 185, 186, 191, 192, 194, 195, 197, 198, 200, 202, 203, 205, 207, 211, 218, 219, 220, 224]
Checking the number of elements in the 'cluster_a_index' list.
len(cluster_a_index)
108
Using the indexes from the 'cluster_a_index' list to get the Season, Profit and Opening Weekend of each movie that was realesed in Winter and Spring.
season_a = []
profit_a = []
open_a = []
for i in cluster_a_index:
season_a.append(df_season['Season'][i])
profit_a.append(df_season['Profit'][i])
open_a.append(df_season['Opening_Weekend'][i])
Showing the 'season_a' list.
print(season_a)
[1, 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 2, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 2]
Showing the 'profit_a' list.
print(profit_a)
[349948323, 24154026, 326398492, 316350619, 82112435, 530998101, 8554727, 318266710, 25358392, 7859167, 34913, 45178935, 3765283, 12636004, 36954520, 20251930, 14610760, 88390, 12744931, 15566240, 156309, 294448, 68711836, 1851683, 556082, 72678948, 10531500, 447351353, 176601214, 26696000, 35694916, 10948425, 120587063, 69137047, 62667874, 83269971, 3835130, 118582776, 3101815, 107956187, 21856053, 104285432, 28716963, 108052686, 3943124, 71808942, 20000000, 1711143, 58693537, 1250000, 3851000, 58491516, 293281000, 278014195, 30482317, 55071636, 37707417, 10300000, 559454789, 129748880, 129590606, 78809717, 60143987, 49309093, 217276928, 26721826, 29802928, 167618160, 38984536, 66050951, 117033509, 71633833, 4847480, 57917283, 40282881, 317522294, 21028230, 36545707, 40506120, 113955898, 5601987, 20909437, 51603136, 21556959, 27087044, 12971021, 23787727, 29964656, 36699612, 13945682, 1205034, 12698355, 13912841, 4856268, 257845, 659312, 89410061, 121165, 13912841, 307113, 13912841, 15566240, 256669, 13912841, 94673038, 34897711, 401802, 858737]
Getting the maximum Profit generated by Drama movies from the 'df_season' dataframe that were realesed in Winter and Spring.
max(profit_a)
559454789
Getting the minimum Profit generated by Drama movies from the 'df_season' dataframe that were realesed in Winter and Spring.
min(profit_a)
34913
Showing the 'open_a' list.
print(open_a)
[30122888, 14953664, 46607250, 38560195, 24400000, 85171450, 1220335, 1443809, 237264, 224476, 160547, 47122, 24587, 473882, 8800230, 246914, 6661234, 81006, 3762145, 193728, 63461, 36134, 118150, 2105729, 63356, 16007426, 44542, 67877361, 16755310, 0, 12177488, 6011585, 22564512, 16007426, 9244641, 14466, 124011, 721341, 82601, 5609875, 93005, 89213, 0, 16015408, 46977, 8556935, 5088381, 0, 16021684, 0, 679185, 16021684, 4625583, 0, 10103675, 0, 0, 0, 35258, 526011, 143818, 16842353, 14789393, 7102085, 24830443, 372920, 13019686, 41202458, 13203458, 21401594, 30468614, 22618358, 9783603, 13002632, 129462, 9851102, 15002635, 8089139, 20874072, 30452, 30452, 8310232, 11727390, 2215891, 68266, 6213362, 13501349, 446380, 212000, 4690214, 53778, 55438, 361000, 69100, 0, 0, 738339, 143632, 361000, 142632, 361000, 193728, 24286, 361000, 738339, 100000, 70188, 0]
Getting the maximum Opening Weekend generated by Drama movies from the 'df_season' dataframe that were realesed in Winter and Spring.
max(open_a)
85171450
Getting the minimum Opening Weekend generated by Drama movies from the 'df_season' dataframe that were realesed in Winter and Spring.
min(open_a)
0
Showing the Frequency of the Repeated Seasons of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Winter and Spring . Which will be stored in a dictionary called 'grouped_season_a'.
grouped_season_a= Counter(season_a)
print(grouped_season_a)#showing the grouped_season_a list
Counter({1: 58, 2: 50})
Using the 'round_to_multiple' function to round the Opening Weekend of Drama movies to the nearest 4 Million, that were realesed in Winter and Spring.
open1_a = []
for i in open_a:open1_a.append(round_to_multiple(i,4000000))
print(open1_a)#showing the open1_a list
[32000000, 16000000, 48000000, 40000000, 24000000, 84000000, 0, 0, 0, 0, 0, 0, 0, 0, 8000000, 0, 8000000, 0, 4000000, 0, 0, 0, 0, 4000000, 0, 16000000, 0, 68000000, 16000000, 0, 12000000, 8000000, 24000000, 16000000, 8000000, 0, 0, 0, 0, 4000000, 0, 0, 0, 16000000, 0, 8000000, 4000000, 0, 16000000, 0, 0, 16000000, 4000000, 0, 12000000, 0, 0, 0, 0, 0, 0, 16000000, 16000000, 8000000, 24000000, 0, 12000000, 40000000, 12000000, 20000000, 32000000, 24000000, 8000000, 12000000, 0, 8000000, 16000000, 8000000, 20000000, 0, 0, 8000000, 12000000, 4000000, 0, 8000000, 12000000, 0, 0, 4000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Winter and Spring . Which will be stored in a dictionary called 'grouped_season_a'.
grouped_season_a = Counter(open1_a)
print(grouped_season_a)#showing the grouped_season_a list
Counter({0: 60, 8000000: 11, 16000000: 10, 4000000: 7, 12000000: 7, 24000000: 4, 32000000: 2, 40000000: 2, 20000000: 2, 48000000: 1, 84000000: 1, 68000000: 1})
Using the 'round_to_multiple' function to round the Profit of Drama movies to the nearest 10 Million, that were realesed in Winter and Spring. .
profit_a_one = []
for i in profit_a:profit_a_one.append(round_to_multiple(i,10000000))
print(profit_a_one)#showing the profit_a_one list
[350000000, 20000000, 330000000, 320000000, 80000000, 530000000, 10000000, 320000000, 30000000, 10000000, 0, 50000000, 0, 10000000, 40000000, 20000000, 10000000, 0, 10000000, 20000000, 0, 0, 70000000, 0, 0, 70000000, 10000000, 450000000, 180000000, 30000000, 40000000, 10000000, 120000000, 70000000, 60000000, 80000000, 0, 120000000, 0, 110000000, 20000000, 100000000, 30000000, 110000000, 0, 70000000, 20000000, 0, 60000000, 0, 0, 60000000, 290000000, 280000000, 30000000, 60000000, 40000000, 10000000, 560000000, 130000000, 130000000, 80000000, 60000000, 50000000, 220000000, 30000000, 30000000, 170000000, 40000000, 70000000, 120000000, 70000000, 0, 60000000, 40000000, 320000000, 20000000, 40000000, 40000000, 110000000, 10000000, 20000000, 50000000, 20000000, 30000000, 10000000, 20000000, 30000000, 40000000, 10000000, 0, 10000000, 10000000, 0, 0, 0, 90000000, 0, 10000000, 0, 10000000, 20000000, 0, 10000000, 90000000, 30000000, 0, 0]
Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Winter and Spring . Which will be stored in a dictionary called 'grouped_season_a'.
print(Counter(profit_a_one))
Counter({0: 23, 10000000: 16, 20000000: 10, 30000000: 9, 40000000: 8, 70000000: 6, 60000000: 6, 320000000: 3, 80000000: 3, 50000000: 3, 120000000: 3, 110000000: 3, 130000000: 2, 90000000: 2, 350000000: 1, 330000000: 1, 530000000: 1, 450000000: 1, 180000000: 1, 100000000: 1, 290000000: 1, 280000000: 1, 560000000: 1, 220000000: 1, 170000000: 1})
Getting the index of Drama movies from the 'df_season' dataframe hat were realesed in the Winter and Spring amd that has a Opening Weekend of $10 Million to $20 Million..
group_one_index = []
for i in cluster_a_index:
if 10000000 <= df_season['Opening_Weekend'][i] <= 20000000:group_one_index.append(i)
print(group_one_index)#showing the group_one_index list
[2, 59, 67, 70, 78, 96, 104, 115, 118, 139, 141, 145, 148, 156, 162, 174, 179]
Checking the number of elements in the 'group_one_index' list.
len(group_one_index)
17
Getting the index of Drama movies from the 'df_season' dataframe hat were realesed in the Winter and Spring and that has a Opening Weekend of $21 Million to $90 Million..
group_two_index = []
for i in cluster_a_index:
if 20000001 <= df_season['Opening_Weekend'][i] <= 90000000:group_two_index.append(i)
print(group_two_index)#showing the group_two_index list
[0, 3, 4, 6, 7, 64, 75, 143, 147, 149, 152, 153, 164]
Checking the number of elements in the 'group_two_index' list.
len(group_two_index)
13
Getting the Profit of Drama movies from the 'df_season' dataframe that were realesed in Winter and Spring with a Opening Weekend of $10 Million to $20 Million.
group_one = []
for i in group_one_index:group_one.append(df_season['Profit'][i])
print(group_one)#showing the group_one list
[24154026, 72678948, 176601214, 35694916, 69137047, 108052686, 58693537, 58491516, 30482317, 78809717, 60143987, 29802928, 38984536, 57917283, 21028230, 51603136, 23787727]
Checking the number of elements in the 'group_one' list.
len(group_one)
17
Getting the Profit of Drama movies from the 'df_season' dataframe that were realesed in Winter and Spring with a Opening Weekend of $21 Million to $90 Million.
group_two = []
for i in group_two_index:group_two.append(df_season['Profit'][i])
print(group_two)#showing the group_two list
[349948323, 326398492, 316350619, 82112435, 530998101, 447351353, 120587063, 217276928, 167618160, 66050951, 117033509, 71633833, 40506120]
Checking the number of elements in the 'group_two' list.
len(group_two)
13
Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_season' dataframe that were realesed in the Winter and Spring with a Opening Weekend of $10 Million to $20 Million . Which will be stored in a dictionary called 'profit_a_one'.
profit_a_one = []
for i in group_one_index:profit_a_one.append(round_to_multiple(df_season['Profit'][i],10000000))
Counter(profit_a_one)
Counter({20000000: 3,
70000000: 2,
180000000: 1,
40000000: 2,
110000000: 1,
60000000: 4,
30000000: 2,
80000000: 1,
50000000: 1})
The maximum Profit of Drama Movies from the 'df_season' dataframe that were realses in Winter and Spring is $180 Million with a Opening Weekend of $10 Million to $20 Million .
max(profit_a_one)
180000000
The minimum Profit of Drama Movies from the 'df_season' dataframe that were realses in Winter and Spring is $20 Million with a Opening Weekend of $10 Million to $20 Million.
min(profit_a_one)
20000000
#30,000,000-100,000,000 (#4)(6%)
for i in group_one:
if 20000000 <= i<= 40000000:print(i)
24154026 35694916 30482317 29802928 38984536 21028230 23787727
#30,000,000-100,000,000 (#4)(6%)
for i in group_one:
if 40000001 <= i<= 60000000:print(i)
58693537 58491516 57917283 51603136
#30,000,000-100,000,000 (#1)(2%)
for i in group_one:
if 60000001 <= i<= 80000000:print(i)
72678948 69137047 78809717 60143987
#30,000,000-100,000,000 (#3)(5%)
for i in group_one:
if 100000000 <= i:print(i)
176601214 108052686
Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_season' dataframe that were realesed in the Winter and Spring with a Opening Weekend of $21 Million to $90 Million . Which will be stored in a dictionary called 'profit_a_one'.
profit_a_two = []
for i in group_two_index:profit_a_two.append(round_to_multiple(df_season['Profit'][i],10000000))
Counter(profit_a_two)
Counter({350000000: 1,
330000000: 1,
320000000: 1,
80000000: 1,
530000000: 1,
450000000: 1,
120000000: 2,
220000000: 1,
170000000: 1,
70000000: 2,
40000000: 1})
The maximum Profit of Drama Movies from the 'df_season' dataframe that were realses in Winter and Spring is $530 Million with a Opening Weekend of $21 Million to $90 Million.
max(profit_a_two)
530000000
The minimum Profit of Drama Movies from the 'df_season' dataframe that were realses in Winter and Spring is $40 Million with a Opening Weekend of $21 Million to $90 Million.
min(profit_a_two)
40000000
#50,000,000-100,000,000 (#5)(24%)
for i in group_two:
if 40000000 <= i<= 80000000:print(i)
66050951 71633833 40506120
#50,000,000-100,000,000 (#5)(24%)
for i in group_two:
if 80000000 <= i<= 200000000:print(i)
82112435 120587063 167618160 117033509
#50,000,000-100,000,000 (#2)(10%)
for i in group_two:
if 200000001 <= i<=400000000:print(i)
349948323 326398492 316350619 217276928
#50,000,000-100,000,000 (#2)(10%)
for i in group_two:
if 400000001 <= i<=500000000:print(i)
447351353
#50,000,000-100,000,000 (#7)(33%)
for i in group_two:
if 500000001 <= i:print(i)
530998101
Getting the index of all the movies that are in the Drama Genre that where realesed in Summer and Autumn, from the 'df_season' dataframe.
cluster_b_index = []
for i,x in enumerate(df_season.cluster):
if x == 1:cluster_b_index.append(i)
print(cluster_b_index)#showing the cluster_b_index list
[1, 5, 8, 9, 10, 11, 13, 14, 15, 18, 20, 22, 24, 25, 26, 28, 30, 31, 33, 34, 37, 38, 39, 41, 42, 45, 48, 49, 51, 54, 55, 56, 57, 58, 60, 62, 63, 65, 66, 68, 71, 72, 73, 76, 77, 81, 82, 86, 87, 89, 90, 91, 95, 97, 98, 103, 105, 106, 107, 108, 109, 110, 113, 114, 119, 120, 123, 124, 125, 127, 129, 130, 131, 133, 134, 135, 136, 137, 140, 146, 150, 151, 154, 158, 159, 160, 167, 168, 169, 171, 172, 173, 177, 181, 182, 187, 188, 189, 190, 193, 196, 199, 201, 204, 206, 208, 209, 210, 212, 213, 214, 215, 216, 217, 221, 222, 223]
Checking the number of elements in the 'cluster_b_index' list.
len(cluster_b_index)
117
Using the indexes from the 'cluster_b_index' list to get the Season, Profit and Opening Weekend of each movie that was realesed in Summer and Autumn.
season_b = []
profit_b = []
open_b = []
for i in cluster_b_index:
season_b.append(df_season['Season'][i])
profit_b.append(df_season['Profit'][i])
open_b.append(df_season['Opening_Weekend'][i])
Showing the 'season_b' list.
print(season_b)
[4, 4, 4, 4, 4, 4, 4, 3, 4, 4, 4, 4, 3, 4, 4, 4, 4, 3, 4, 4, 3, 4, 3, 4, 4, 4, 3, 4, 3, 4, 4, 4, 4, 4, 3, 4, 3, 3, 4, 3, 4, 4, 3, 4, 4, 3, 3, 4, 4, 4, 3, 4, 3, 3, 3, 4, 4, 3, 3, 4, 3, 3, 4, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 4, 4, 3, 4, 3, 3, 3, 4, 4, 3, 4, 4, 3, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 4, 4, 3, 3, 4, 4, 4]
Showing the 'open_b' list.
print(open_b)
[37513109, 13143310, 736311, 24900566, 10470145, 492648, 19497324, 9700000, 5100000, 118298, 2002165, 253510, 13575172, 257174, 256498, 7485546, 52041, 387618, 561906, 135388, 84797, 156833, 1767308, 18623, 100268, 137651, 104030, 170335, 13307125, 2337594, 287081, 11364505, 19152401, 27547866, 11351389, 1203011, 0, 11351389, 27547866, 8146533, 5268764, 9178233, 13616196, 9421369, 6836036, 24517121, 20584908, 1528982, 2739680, 298277, 2189966, 89054, 2534729, 518795, 12146143, 2914486, 162146, 10028065, 7810481, 0, 21037414, 8742545, 11457353, 220297, 1586753, 0, 0, 0, 0, 55785112, 22403596, 11947744, 35574710, 220522, 320690, 24074047, 12381585, 15371203, 29632823, 11731703, 10003827, 26044590, 12305016, 18723269, 4765838, 105005, 5079566, 76244, 228359, 5467084, 187281, 15679190, 14065500, 4750894, 21688103, 9112839, 20321, 128140, 77740, 0, 85709, 63918, 100316, 100316, 649423, 11014818, 63918, 25775847, 0, 11166687, 31665, 245398, 0, 85709, 63918, 130303, 0]
Getting the maximum Opening Weekend generated by Drama movies from the 'df_season' dataframe that were realesed in Summer and Autumn.
max(open_b)
55785112
Getting the minimum Opening Weekend generated by Drama movies from the 'df_season' dataframe that were realesed in Summer and Autumn.
min(open_b)
0
Showing the 'profit_b' list.
print(profit_b)
[307567189, 19966854, 13147416, 129558438, 54735925, 9898681, 17017873, 26604054, 8270399, 23262783, 23830713, 31043521, 60133905, 12417298, 69233867, 12499242, 222016, 53273049, 17033227, 35669037, 14131551, 9295324, 8153415, 4328516, 19282640, 4438911, 2669782, 48766923, 14718173, 1500000, 2000000, 47784, 59068724, 284604712, 70975239, 4609597, 36918287, 70986904, 285937718, 33102988, 4344615, 6741732, 74830111, 34605762, 32973297, 120036382, 81120329, 48954968, 5164458, 31440294, 12815212, 150297525, 7423752, 544368315, 42892670, 11587135, 418656843, 43947950, 12469621, 35099643, 255500000, 216100000, 58985708, 7657973, 941214868, 267142000, 23794409, 52500000, 5850377, 583698673, 77551594, 35552675, 163591522, 58660270, 22004627, 156127894, 4478084, 122498338, 136567581, 132552290, 15059418, 188120004, 41540205, 188265198, 2281732, 57086711, 44168692, 20044909, 20069303, 11477345, 67356170, 51076141, 72831866, 10369708, 143806510, 33185884, 4152584, 3478400, 1927779, 8404, 18912216, 52091915, 15465835, 15390895, 1315026, 201120004, 50167430, 2311944, 2548651, 16283563, 3664240, 1038916, 8000000, 18912216, 50167430, 3546453, 958404]
Getting the maximum Profit generated by Drama movies from the 'df_season' dataframe that were realesed in Summer and Autumn.
max(profit_b)
941214868
Getting the minimum Profit generated by Drama movies from the 'df_season' dataframe that were realesed in Summer and Autumn.
min(profit_b)
8404
Showing the Frequency of the Repeated Seasons of the Drama movies from the 'df_season' dataframe that were realesed in the Summer and Autumn . Which will be stored in a dictionary called 'grouped_season_b'.
grouped_season_b= Counter(season_b)
print(grouped_season_b)#showing the grouped_season_a list
Counter({4: 78, 3: 39})
Using the 'round_to_multiple' function to round the Opening Weekend of Drama movies to the nearest 10 Million, that were realesed in Summer and Autumn.
open1_b = []
for i in open_b:open1_b.append(round_to_multiple(i,10000000))
print(open1_b)#showing the open1_b list
[40000000, 10000000, 0, 20000000, 10000000, 0, 20000000, 10000000, 10000000, 0, 0, 0, 10000000, 0, 0, 10000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10000000, 0, 0, 10000000, 20000000, 30000000, 10000000, 0, 0, 10000000, 30000000, 10000000, 10000000, 10000000, 10000000, 10000000, 10000000, 20000000, 20000000, 0, 0, 0, 0, 0, 0, 0, 10000000, 0, 0, 10000000, 10000000, 0, 20000000, 10000000, 10000000, 0, 0, 0, 0, 0, 0, 60000000, 20000000, 10000000, 40000000, 0, 0, 20000000, 10000000, 20000000, 30000000, 10000000, 10000000, 30000000, 10000000, 20000000, 0, 0, 10000000, 0, 0, 10000000, 0, 20000000, 10000000, 0, 20000000, 10000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10000000, 0, 30000000, 0, 10000000, 0, 0, 0, 0, 0, 0, 0]
Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_season' dataframe that were realesed in the Summer and Autumn . Which will be stored in a dictionary called 'grouped_season_b'.
grouped_season_b = Counter(open1_b)
print(grouped_season_b)#showing the grouped_season_b list
Counter({0: 65, 10000000: 32, 20000000: 12, 30000000: 5, 40000000: 2, 60000000: 1})
Using the 'round_to_multiple' function to round the Profit of Drama movies to the nearest 10 Million, that were realesed in Summer and Autumn.
profit1_b = []
for i in open_b:profit1_b.append(round_to_multiple(i,10000000))
print(profit1_b)#showing the profit1_b list
[40000000, 10000000, 0, 20000000, 10000000, 0, 20000000, 10000000, 10000000, 0, 0, 0, 10000000, 0, 0, 10000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10000000, 0, 0, 10000000, 20000000, 30000000, 10000000, 0, 0, 10000000, 30000000, 10000000, 10000000, 10000000, 10000000, 10000000, 10000000, 20000000, 20000000, 0, 0, 0, 0, 0, 0, 0, 10000000, 0, 0, 10000000, 10000000, 0, 20000000, 10000000, 10000000, 0, 0, 0, 0, 0, 0, 60000000, 20000000, 10000000, 40000000, 0, 0, 20000000, 10000000, 20000000, 30000000, 10000000, 10000000, 30000000, 10000000, 20000000, 0, 0, 10000000, 0, 0, 10000000, 0, 20000000, 10000000, 0, 20000000, 10000000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10000000, 0, 30000000, 0, 10000000, 0, 0, 0, 0, 0, 0, 0]
Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_season' dataframe that were realesed in the Summer and Autumn . Which will be stored in a dictionary called 'grouped_season_b'.
grouped_season_b= Counter(profit1_b)
print(grouped_season_b)#showing the grouped_season_b list
Counter({0: 65, 10000000: 32, 20000000: 12, 30000000: 5, 40000000: 2, 60000000: 1})
Getting the index of Drama movies from the 'df_season' dataframe hat were realesed in the Summer and Autumn amd that has a Opening Weekend of $1 Million to $10 Million..
group_one_index = []
for i in cluster_b_index:
if 1000000 <= df_season['Opening_Weekend'][i] <= 10000000:group_one_index.append(i)
print(group_one_index)#showing the group_one_index list
[14, 15, 20, 28, 39, 54, 62, 68, 71, 72, 76, 77, 86, 87, 90, 95, 103, 107, 110, 119, 159, 167, 171, 181, 187]
Checking the number of elements in the 'group_one_index' list.
len(group_one_index)
25
Getting the index of Drama movies from the 'df_season' dataframe hat were realesed in the Summer and Autumn amd that has a Opening Weekend of $11 Million to $60 Million..
group_two_index = []
for i in cluster_b_index:
if 10000001 <= df_season['Opening_Weekend'][i] :group_two_index.append(i)
print(group_two_index)#showing the group_two_index list
[1, 5, 9, 10, 13, 24, 51, 56, 57, 58, 60, 65, 66, 73, 81, 82, 98, 106, 109, 113, 127, 129, 130, 131, 135, 136, 137, 140, 146, 150, 151, 154, 158, 173, 177, 182, 208, 210, 213]
Checking the number of elements in the 'group_two_index' list.
len(group_two_index)
39
Getting the Profit of Drama movies from the 'df_season' dataframe that were realesed in Summer and Autumn with a Opening Weekend of $1 Million to $11 Million.
group_one = []
for i in group_one_index:group_one.append(df_season['Profit'][i])
print(group_one)#showing the group_one list
[26604054, 8270399, 23830713, 12499242, 8153415, 1500000, 4609597, 33102988, 4344615, 6741732, 34605762, 32973297, 48954968, 5164458, 12815212, 7423752, 11587135, 12469621, 216100000, 941214868, 2281732, 44168692, 11477345, 10369708, 33185884]
Getting the Profit of Drama movies from the 'df_season' dataframe that were realesed in Summer and Autumn with a Opening Weekend of $11 Million to $60 Million.
group_two = []
for i in group_two_index:group_two.append(df_season['Profit'][i])
print(group_two)#showing the group_two list
[307567189, 19966854, 129558438, 54735925, 17017873, 60133905, 14718173, 47784, 59068724, 284604712, 70975239, 70986904, 285937718, 74830111, 120036382, 81120329, 42892670, 43947950, 255500000, 58985708, 583698673, 77551594, 35552675, 163591522, 156127894, 4478084, 122498338, 136567581, 132552290, 15059418, 188120004, 41540205, 188265198, 51076141, 72831866, 143806510, 201120004, 2311944, 16283563]
Checking the number of elements in the 'group_one' list.
len(group_one)
25
Checking the number of elements in the 'group_two' list.
len(group_two)
39
Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_season' dataframe that were realesed in the Summer and Autumn with a Opening Weekend of $1 Million to $11 Million . Which will be stored in a dictionary called 'profit_b_one'.
profit_b_one = []
for i in group_one:profit_b_one.append(round_to_multiple(i,50000000))
collections.Counter(profit_b_one)
Counter({50000000: 7, 0: 16, 200000000: 1, 950000000: 1})
The maximum Profit of Drama Movies from the 'df_season' dataframe that were realesed in Summer and Autumn is $1 Billion with a Opening Weekend of $1 Million to $11 Million.
max(group_one)
941214868
The minimum Profit of Drama Movies from the 'df_season' dataframe that were realses in Summer and Autumn is $1.5 Million with a Opening Weekend of $1 Million to $11 Million.
min(group_one)
1500000
#30,000,000-100,000,000 (#10)(20%)
for i in group_one:
if 10000000 <=i<=20000000:print(i)
12499242 12815212 11587135 12469621 11477345 10369708
#30,000,000-100,000,000 (#4)(8%)
for i in group_one:
if 20000001 <= i<=50000000:print(i)
26604054 23830713 33102988 34605762 32973297 48954968 44168692 33185884
#30,000,000-100,000,000 (#11)(21%)
for i in group_one:
if 50000001 <=i:print(i)
216100000 941214868
Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_season' dataframe that were realesed in the Summer and Autumn with a Opening Weekend of $11 Million to $60 Million . Which will be stored in a dictionary called 'profit_b_two'.
profit_b_two = []
for i in group_two:profit_b_two.append(round_to_multiple(i,50000000))
collections.Counter(profit_b_two)
Counter({300000000: 3,
0: 8,
150000000: 6,
50000000: 13,
100000000: 4,
250000000: 1,
600000000: 1,
200000000: 3})
The maximum Profit of Drama Movies from the 'df_season' dataframe that were realesed in Summer and Autumn is $600 Million with a Opening Weekend of $11 Million to $60 Million.
max(group_two)
583698673
The minimum Profit of Drama Movies from the 'df_season' dataframe that were realses in Summer and Autumn is $50,000 with a Opening Weekend of $11 Million to $60 Million.
min(group_two)
47784
#50,000,000-100,000,000 (#8)(33%)
for i in group_two:
if 0 <= i<=1000000:print(i)
47784
#50,000,000-100,000,000 (#7)(30%)
for i in group_two:
if 1000001<= i<=10000000:print(i)
4478084 2311944
#50,000,000-100,000,000 (#7)(30%)
for i in group_two:
if 10000001 <= i<=100000000:print(i)
19966854 54735925 17017873 60133905 14718173 59068724 70975239 70986904 74830111 81120329 42892670 43947950 58985708 77551594 35552675 15059418 41540205 51076141 72831866 16283563
#50,000,000-100,000,000 (#7)(30%)
for i in group_two:
if 100000001<= i<=200000000:print(i)
129558438 120036382 163591522 156127894 122498338 136567581 132552290 188120004 188265198 143806510
#50,000,000-100,000,000 (#7)(30%)
for i in group_two:
if 200000001 <= i<=400000000:print(i)
307567189 284604712 285937718 255500000 201120004
#50,000,000-100,000,000 (#3)(13%)
for i in group_two:
if 400000001 <= i<=600000000:print(i)
583698673
Creating the df_4D dataframe.
df_4D = pd.DataFrame({'Budget':r_cost+pg_cost+g_cost+pg13_cost+nc17_cost,
'Season':season_r+season_pg+season_g+season_pg13+season_nc17,
"Month_Realesed":r_month+pg_month+g_month+pg13_month+nc17_month,
"Opening_Weekend":r_opening_weekend+pg_opening_weekend
+g_opening_weekend+pg13_opening_weekend+nc17_opening_weekend})
The 'df_4D' dataframe. (this dataframe is interactive)
df_4D
| Budget | Season | Month_Realesed | Opening_Weekend |
|---|---|---|---|
| Loading... (need help?) |
Creating a 4D scatter plot of the Budget, Season, Month Realesed and Opening Weekend of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimation' libary to create a 4d scatter plot animate object
def animate(i):
# azimuth angle : 0 deg to 360 deg
ax.view_init(elev=10, azim=i*4)
return fig
#fig = plt.figure(figsize=(5, 5))
fig = plt.figure()
#ax = Axes3D(fig)
#fig, ax = plt.subplots()
ax = Axes3D(fig)
#ax = fig.add_subplot(1, 2, 1, projection='3d')
#fig.subplots_adjust(left=0.125, projection='3d')
#fig.subplots_adjust(bottom = 0.1)
#fig.subplots_adjust(top = 0.9)
#fig.subplots_adjust(right = 0.9)
#fig = plt.figure(figsize=(6,4))
#ax = Axes3D(fig)
x = df['Budget']
y = df['Season']
z = df['Month_Realesed']
c = df['Opening_Weekend']
cluster = ax.scatter(x , y, z, c=c, alpha=0.5,s=50,cmap='Reds_r')
cluster = ax.set_xlabel('Budget')
cluster = ax.set_ylabel('Season')
cluster = ax.set_zlabel('Month_Realesed')
fig.colorbar(plt.cm.ScalarMappable(cmap = 'Reds_r'), ax = ax, aspect = 5, shrink = 0.5)
ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
ani
C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/1250563926.py:11: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6. This is consistent with other Axes classes. ax = Axes3D(fig)
<matplotlib.animation.FuncAnimation at 0x2bfc24fb1f0>
Saving the animated 4D scatter plot gif as 'drama6.gif'.
writergif = animation.PillowWriter(fps=30)
ani.save('drama6.gif', fps=10 )
MovieWriter ffmpeg unavailable; using Pillow instead.
The first 4D Scatter Plot (part A): the x-axis is the 'Budget', the y-axis is the 'Seson', the z-axis is the 'Month Realesed' and the c-axis is the 'Opening Weekend'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into clusters. These clusters are based on seasons, the Opening Weekend will be analyzed based on the budegt of the movies.
Creating the cluster list.
y_predicted = []
for i in df_4D["Season"]:
if i == 1:y_predicted.append(0)
if i == 2:y_predicted.append(1)
if i == 3:y_predicted.append(2)
if i == 4:y_predicted.append(3)
Adding the cluster list to the 'df_4D' dataframe.
df_4D['cluster'] = y_predicted
The updated 'df_4D' dataframe. (this dataframe is interactive)
df_4D
| Budget | Season | Month_Realesed | Opening_Weekend |
|---|---|---|---|
| Loading... (need help?) |
Creating a 4D scatter plot of the Budget, Season, Month Realesed and Opening Weekend of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimation' libary to create a 4d scatter plot animate object
def animate(i):
# azimuth angle : 0 deg to 360 deg
ax.view_init(elev=10, azim=i*4)
return fig
fig = plt.figure()
#fig = plt.figure(figsize=(4, 15))
#fig = plt.figure()
ax = Axes3D(fig)
df1 = df[df.cluster==0]
df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
df4 = df[df.cluster==3]
x1 = df1['Budget']
y1 = df1['Season']
z1 = df1['Month_Realesed']
c1 = df1['Opening_Weekend']
x2 = df2['Budget']
y2 = df2['Season']
z2 = df2['Month_Realesed']
c2 = df2['Opening_Weekend']
x3 = df3['Budget']
y3 = df3['Season']
z3 = df3['Month_Realesed']
c3 = df3['Opening_Weekend']
x4 = df4['Budget']
y4 = df4['Season']
z4 = df4['Month_Realesed']
c4 = df4['Opening_Weekend']
#ax1 = fig.add_subplot(131, projection='3d')
scatter = ax.scatter(x1,y1,z1, alpha=0.5,s=50, color = '#C40233')
scatter = ax.scatter(x2,y2,z2, alpha=0.5,s=50, color = 'red')
scatter = ax.scatter(x3,y3,z3, alpha=0.5,s=50, color = '#F400A1')
scatter = ax.scatter(x4,y4,z4, alpha=0.5,s=50, color = 'purple')
scatter = ax.set_xlabel('Budget')
scatter = ax.set_ylabel('Season')
scatter = ax.set_zlabel('Month_Realesed')
ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
ani
C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/1351730319.py:11: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6. This is consistent with other Axes classes. ax = Axes3D(fig)
<matplotlib.animation.FuncAnimation at 0x2bfc38f8250>
Saving the animated 4D scatter plot gif as 'drama7.gif'.
writergif = animation.PillowWriter(fps=30)
ani.save('drama7.gif', fps=10 )
MovieWriter ffmpeg unavailable; using Pillow instead.
The first 4D Scatter Plot (part B): the x-axis is the 'Budget', the y-axis is the 'Seson', the z-axis is the 'Month Realesed' and the c-axis is the 'Opening Weekend'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into clusters. These clusters are based on seasons which are Winter, Spring, Summer and Autumn,, the Opening Weekend will be analyzed based on the budegt of the movies.
Getting the index of all the movies that are in the Drama Genre that where realesed in Winter, from the 'df_4D' dataframe.
cluster_a_index = []
for i,x in enumerate(df_4D.Season):
if x == 1:cluster_a_index.append(i)
print(cluster_a_index)#showing the cluster_a_index list
[0, 3, 4, 6, 7, 16, 19, 23, 27, 29, 32, 44, 52, 53, 61, 67, 69, 70, 75, 80, 84, 85, 88, 92, 93, 94, 100, 112, 118, 121, 128, 132, 138, 141, 142, 143, 147, 149, 152, 156, 157, 163, 165, 166, 170, 176, 178, 179, 183, 185, 191, 198, 200, 202, 203, 205, 211, 219]
Checking the number of elements in the 'cluster_a_index' list.
len(cluster_a_index)
58
Using the indexes from the 'cluster_a_index' list to get the Season, Budget, Opening Weekend and Month Realesed of each movie that was realesed in Winter.
season_a = []
budget_a = []
open_a = []
month_a = []
for i in cluster_a_index:
season_a.append(df_4D['Season'][i])
budget_a.append(df_4D['Budget'][i])
open_a.append(df_4D['Opening_Weekend'][i])
month_a.append(df_4D['Month_Realesed'][i])
Showing the 'season_a' list.
print(season_a)
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Showing the 'month_a' list.
print(month_a)
[12, 2, 2, 12, 2, 12, 12, 1, 1, 1, 2, 12, 12, 1, 2, 12, 12, 1, 2, 1, 1, 2, 12, 12, 2, 12, 2, 12, 12, 12, 12, 12, 12, 1, 12, 12, 2, 2, 2, 2, 12, 2, 2, 12, 1, 12, 1, 1, 1, 12, 12, 2, 1, 2, 12, 12, 12, 1]
Showing the 'budget_a' list.
print(budget_a)
[100000000, 55000000, 55000000, 52500000, 40000000, 13000000, 12000000, 11000000, 7000000, 4900000, 3500000, 1000000, 100000, 2700000, 1700000, 40000000, 422000, 11800000, 17000000, 23000000, 16000000, 3000000, 20000000, 14000000, 15000000, 12000000, 8200000, 666000, 60000000, 17000000, 75000000, 50000000, 40000000, 37000000, 36000000, 35000000, 30000000, 28000000, 25000000, 25000000, 24000000, 16000000, 15000000, 15000000, 12000000, 9700000, 9000000, 7400000, 5000000, 5000000, 6500000, 15000000, 6500000, 15000000, 6500000, 1000000, 6500000, 1250000]
Getting the maximum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Winter.
max(budget_a)
100000000
Getting the minimum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Winter.
min(budget_a)
100000
Showing the 'open_a' list.
print(open_a)
[30122888, 46607250, 38560195, 24400000, 85171450, 1443809, 224476, 47122, 24587, 473882, 8800230, 193728, 2105729, 63356, 44542, 16755310, 0, 12177488, 22564512, 14466, 721341, 82601, 5609875, 93005, 89213, 0, 8556935, 679185, 10103675, 0, 35258, 526011, 143818, 14789393, 7102085, 24830443, 41202458, 21401594, 30468614, 13002632, 129462, 8089139, 30452, 30452, 8310232, 68266, 6213362, 13501349, 212000, 53778, 361000, 143632, 361000, 142632, 361000, 193728, 361000, 100000]
Getting the maximum Opening Weekend generated by Drama movies from the 'df_4D' dataframe that were realesed in Winter.
max(open_a)
85171450
Getting the minimum Opening Weekend generated by Drama movies from the 'df_4D' dataframe that were realesed in Winter.
min(open_a)
0
Showing the Frequency of the Repeated Months of the Drama movies from the 'df_4D' dataframe that were realesed in the Winter .
print(Counter(month_a))
Counter({12: 27, 2: 17, 1: 14})
Showing the Frequency of the Repeated Budgets of the Drama movies from the 'df_4D' dataframe that were realesed in the Winter .
print(Counter(budget_a))
Counter({15000000: 5, 6500000: 4, 40000000: 3, 12000000: 3, 55000000: 2, 1000000: 2, 17000000: 2, 16000000: 2, 25000000: 2, 5000000: 2, 100000000: 1, 52500000: 1, 13000000: 1, 11000000: 1, 7000000: 1, 4900000: 1, 3500000: 1, 100000: 1, 2700000: 1, 1700000: 1, 422000: 1, 11800000: 1, 23000000: 1, 3000000: 1, 20000000: 1, 14000000: 1, 8200000: 1, 666000: 1, 60000000: 1, 75000000: 1, 50000000: 1, 37000000: 1, 36000000: 1, 35000000: 1, 30000000: 1, 28000000: 1, 24000000: 1, 9700000: 1, 9000000: 1, 7400000: 1, 1250000: 1})
Using the 'round_to_multiple' function to round the Budget of Drama movies to the nearest 1 Million, that were realesed in Winter. .
budg_a_one = []
for i in group_one_index:budg_a_one.append(round_to_multiple(df_4D['Budget'][i],1000000))
print(budg_a_one)#showing the budg_a_one list
[20000000, 20000000, 12000000, 5000000, 2000000, 12000000, 5000000, 5000000, 15000000, 32000000, 30000000, 0, 15000000, 10000000, 5000000, 8000000, 7000000, 15000000, 30000000, 45000000, 20000000, 15000000, 12000000, 6000000, 2000000]
Showing the Frequency of the Repeated Values of the Budget of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Winter .
print(Counter(budg_a_one))
Counter({5000000: 4, 15000000: 4, 20000000: 3, 12000000: 3, 2000000: 2, 30000000: 2, 32000000: 1, 0: 1, 10000000: 1, 8000000: 1, 7000000: 1, 45000000: 1, 6000000: 1})
Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Winter amd that has a Budget of $100,000 to $20 Million..
group_one_index = []
for i in cluster_a_index:
if 0 <= df_4D['Budget'][i] <= 20000000:group_one_index.append(i)
print(group_one_index)#showing the group_one_index list
[16, 19, 23, 27, 29, 32, 44, 52, 53, 61, 69, 70, 75, 84, 85, 88, 92, 93, 94, 100, 112, 121, 163, 165, 166, 170, 176, 178, 179, 183, 185, 191, 198, 200, 202, 203, 205, 211, 219]
Checking the number of elements in the 'group_one_index' list.
len(group_one_index)
39
Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Winter amd that has a Budget of $21 Million to $100 Million..
group_two_index = []
for i in cluster_a_index:
if 20000001 <= df_4D['Budget'][i] :group_two_index.append(i)
print(group_two_index)#showing the group_two_index list
[0, 3, 4, 6, 7, 67, 80, 118, 128, 132, 138, 141, 142, 143, 147, 149, 152, 156, 157]
Checking the number of elements in the 'group_one_index' list.
len(group_two_index)
19
Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Winter with a Budget of $100,000 to $20 Million.
open_a_one = []
for i in group_one_index:open_a_one.append(df_4D['Opening_Weekend'][i])
print(open_a_one)#showing the open_a_one list
[1443809, 224476, 47122, 24587, 473882, 8800230, 193728, 2105729, 63356, 44542, 0, 12177488, 22564512, 721341, 82601, 5609875, 93005, 89213, 0, 8556935, 679185, 0, 8089139, 30452, 30452, 8310232, 68266, 6213362, 13501349, 212000, 53778, 361000, 143632, 361000, 142632, 361000, 193728, 361000, 100000]
Checking the number of elements in the 'open_a_one' list.
len(open_a_one)
39
Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Winter with a Budget of $21 Million to $100 Million.
open_a_two = []
for i in group_two_index:open_a_two.append(df_4D['Opening_Weekend'][i])
print(open_a_two)#showing the open_a_two list
[30122888, 46607250, 38560195, 24400000, 85171450, 16755310, 14466, 10103675, 35258, 526011, 143818, 14789393, 7102085, 24830443, 41202458, 21401594, 30468614, 13002632, 129462]
Checking the number of elements in the 'open_a_two' list.
len(open_a_two)
19
Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Winter with a Budget of $100,000 to $20 Million . Which will be stored in a dictionary called 'open_a_one'.
open_a_one1 = []
for i in group_one_index:open_a_one1.append(round_to_multiple(df_4D['Opening_Weekend'][i],1000000))
Counter(open_a_one1)
Counter({1000000: 3,
0: 26,
9000000: 2,
2000000: 1,
12000000: 1,
23000000: 1,
6000000: 2,
8000000: 2,
14000000: 1})
The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Winter is $24 Million with a Budget of $100,000 to $20 Million.
max(open_a_one)
22564512
The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realses in Winter is $0 with a Budget of $100,000 to $20 Million.
min(open_a_one)
0
#30,000,000-100,000,000 (#10)(20%)
for i in open_a_one:
if 0 <= i <=5000000:print(i)
1443809 224476 47122 24587 473882 193728 2105729 63356 44542 0 721341 82601 93005 89213 0 679185 0 30452 30452 68266 212000 53778 361000 143632 361000 142632 361000 193728 361000 100000
#30,000,000-100,000,000 (#10)(20%)
for i in open_a_one:
if 5000001 <= i <=10000000:print(i)
8800230 5609875 8556935 8089139 8310232 6213362
#30,000,000-100,000,000 (#10)(20%)
for i in open_a_one:
if 10000001 <= i <=15000000:print(i)
12177488 13501349
#30,000,000-100,000,000 (#10)(20%)
for i in open_a_one:
if 20000001 <= i :print(i)
22564512
Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Winter with a Budget of $21 Million to $100 Million . Which will be stored in a dictionary called 'open_a_two'.
open_a_two = []
for i in group_two_index:open_a_two.append(round_to_multiple(df_4D['Opening_Weekend'][i],10000000))
Counter(open_a_two)
Counter({30000000: 2,
50000000: 1,
40000000: 2,
20000000: 4,
90000000: 1,
0: 5,
10000000: 4})
The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Winter is $90 Million with a Budget of $21 Million to $100 Million.
max(open_a_two)
85171450
The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realses in Winter is $15,000 with a Budget of $21 Million to $100 Million.
min(open_a_two)
14466
#30,000,000-100,000,000 (#10)(20%)
for i in open_a_two:
if 0 <= i <=20000000:print(i)
16755310 14466 10103675 35258 526011 143818 14789393 7102085 13002632 129462
#30,000,000-100,000,000 (#10)(20%)
for i in open_a_two:
if 20000001 <= i <=40000000:print(i)
30122888 38560195 24400000 24830443 21401594 30468614
#30,000,000-100,000,000 (#10)(20%)
for i in open_a_two:
if 40000001 <= i <=60000000:print(i)
46607250 41202458
#30,000,000-100,000,000 (#10)(20%)
for i in open_a_two:
if 80000001 <= i <=100000000:print(i)
85171450
Getting the index of all the movies that are in the Drama Genre that where realesed in Spring, from the 'df_4D' dataframe.
cluster_b_index = []
for i,x in enumerate(df_4D.Season):
if x == 2:cluster_b_index.append(i)
print(cluster_b_index)#showing the cluster_b_index list
[2, 12, 17, 21, 35, 36, 40, 43, 46, 47, 50, 59, 64, 74, 78, 79, 83, 96, 99, 101, 102, 104, 111, 115, 116, 117, 122, 126, 139, 144, 145, 148, 153, 155, 161, 162, 164, 174, 175, 180, 184, 186, 192, 194, 195, 197, 207, 218, 220, 224]
Checking the number of elements in the 'cluster_b_index' list.
len(cluster_b_index)
50
Using the indexes from the 'cluster_b_index' list to get the Season, Budget, Opening Weekend and Month Realesed of each movie that was realesed in Spring.
season_b = []
budget_b = []
open_b = []
month_b = []
for i in cluster_b_index:
season_b.append(df_4D['Season'][i])
budget_b.append(df_4D['Budget'][i])
open_b.append(df_4D['Opening_Weekend'][i])
month_b.append(df_4D['Month_Realesed'][i])
Showing the 'season_b' list.
print(season_b)
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
Showing the 'month_b' list.
print(month_b)
[5, 4, 4, 3, 5, 3, 4, 3, 5, 4, 5, 3, 3, 4, 3, 3, 5, 5, 3, 5, 4, 3, 5, 3, 4, 4, 3, 5, 4, 5, 4, 4, 4, 4, 4, 3, 3, 5, 4, 4, 3, 5, 3, 4, 4, 3, 4, 5, 3, 4]
Showing the 'budget_b' list.
print(budget_b)
[60000000, 22500000, 13000000, 12000000, 3000000, 2000000, 2000000, 1500000, 1000000, 135000, 8500000, 20000000, 95000000, 8000000, 20000000, 2000000, 10000000, 17000000, 4500000, 28000000, 700000, 22000000, 2500000, 22000000, 18000000, 8200000, 10000000, 1700000, 38000000, 35000000, 34000000, 30000000, 25000000, 25000000, 17000000, 17000000, 16000000, 10000000, 10000000, 7000000, 5000000, 2600000, 12500000, 20000, 955472, 9000000, 3565572, 6500000, 12000, 612072]
Getting the maximum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Spring.
max(budget_b)
95000000
Getting the minimum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Spring.
min(budget_b)
12000
Showing the 'open_a' list.
print(open_a)
[30122888, 46607250, 38560195, 24400000, 85171450, 1443809, 224476, 47122, 24587, 473882, 8800230, 193728, 2105729, 63356, 44542, 16755310, 0, 12177488, 22564512, 14466, 721341, 82601, 5609875, 93005, 89213, 0, 8556935, 679185, 10103675, 0, 35258, 526011, 143818, 14789393, 7102085, 24830443, 41202458, 21401594, 30468614, 13002632, 129462, 8089139, 30452, 30452, 8310232, 68266, 6213362, 13501349, 212000, 53778, 361000, 143632, 361000, 142632, 361000, 193728, 361000, 100000]
Getting the maximum Opening Weekend generated by Drama movies from the 'df_4D' dataframe that were realesed in Spring.
max(open_a)
85171450
Getting the minimum Opening Weekend generated by Drama movies from the 'df_4D' dataframe that were realesed in Spring.
min(open_a)
0
Showing the Frequency of the Repeated Months of the Drama movies from the 'df_4D' dataframe that were realesed in the Spring .
print(Counter(month_b))
Counter({4: 20, 3: 17, 5: 13})
Showing the Frequency of the Repeated Budgets of the Drama movies from the 'df_4D' dataframe that were realesed in the Spring .
print(Counter(budget_b))
Counter({10000000: 4, 2000000: 3, 17000000: 3, 20000000: 2, 22000000: 2, 25000000: 2, 60000000: 1, 22500000: 1, 13000000: 1, 12000000: 1, 3000000: 1, 1500000: 1, 1000000: 1, 135000: 1, 8500000: 1, 95000000: 1, 8000000: 1, 4500000: 1, 28000000: 1, 700000: 1, 2500000: 1, 18000000: 1, 8200000: 1, 1700000: 1, 38000000: 1, 35000000: 1, 34000000: 1, 30000000: 1, 16000000: 1, 7000000: 1, 5000000: 1, 2600000: 1, 12500000: 1, 20000: 1, 955472: 1, 9000000: 1, 3565572: 1, 6500000: 1, 12000: 1, 612072: 1})
Using the 'round_to_multiple' function to round the Budget of Drama movies to the nearest 5 Million, that were realesed in Spring .
bud_b_one = []
for i in cluster_b_index:bud_b_one.append(round_to_multiple(df_4D['Budget'][i],5000000))
print(bud_b_one)#showing the bud_b_one list
[60000000, 20000000, 15000000, 10000000, 5000000, 0, 0, 0, 0, 0, 10000000, 20000000, 95000000, 10000000, 20000000, 0, 10000000, 15000000, 5000000, 30000000, 0, 20000000, 0, 20000000, 20000000, 10000000, 10000000, 0, 40000000, 35000000, 35000000, 30000000, 25000000, 25000000, 15000000, 15000000, 15000000, 10000000, 10000000, 5000000, 5000000, 5000000, 10000000, 0, 0, 10000000, 5000000, 5000000, 0, 0]
Showing the Frequency of the Repeated Values of the Budget of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Spring .
print(Counter(bud_b_one))
Counter({0: 13, 10000000: 10, 5000000: 7, 20000000: 6, 15000000: 5, 30000000: 2, 35000000: 2, 25000000: 2, 60000000: 1, 95000000: 1, 40000000: 1})
Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Spring amd that has a Budget of $12,000 to $20 Million..
group_one_index = []
for i in cluster_b_index:
if 0 <= df_4D['Budget'][i] <= 20000000:group_one_index.append(i)
print(group_one_index)#showing the group_one_index list
[17, 21, 35, 36, 40, 43, 46, 47, 50, 59, 74, 78, 79, 83, 96, 99, 102, 111, 116, 117, 122, 126, 161, 162, 164, 174, 175, 180, 184, 186, 192, 194, 195, 197, 207, 218, 220, 224]
Checking the number of elements in the 'group_one_index' list.
len(group_one_index)
38
Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Spring amd that has a Budget of $21 Million to $95 Million..
group_two_index = []
for i in cluster_b_index:
if 20000001 <= df_4D['Budget'][i] <= 100000000:group_two_index.append(i)
print(group_two_index)#showing the group_two_index list
[2, 12, 64, 101, 104, 115, 139, 144, 145, 148, 153, 155]
Checking the number of elements in the 'group_two_index' list.
len(group_two_index)
12
Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Spring with a Budget of $12,000 to $20 Million.
open_b_one = []
for i in group_one_index:open_b_one.append(df_4D['Opening_Weekend'][i])
print(open_b_one)#showing the open_b_one list
[237264, 160547, 246914, 6661234, 81006, 3762145, 63461, 36134, 118150, 16007426, 6011585, 16007426, 9244641, 124011, 16015408, 46977, 0, 0, 4625583, 0, 0, 0, 9851102, 15002635, 20874072, 11727390, 2215891, 446380, 4690214, 55438, 69100, 0, 0, 738339, 24286, 738339, 70188, 0]
Checking the number of elements in the 'open_b_one' list.
len(open_b_one)
38
Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Spring with a Budget of $21 Million to $95 Million.
open_b_two = []
for i in group_two_index:open_b_two.append(df_4D['Opening_Weekend'][i])
print(open_b_two)#showing the open_b_two list
[14953664, 1220335, 67877361, 5088381, 16021684, 16021684, 16842353, 372920, 13019686, 13203458, 22618358, 9783603]
Checking the number of elements in the 'open_b_two' list.
len(open_b_two)
12
Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Spring with a Budget of $12,000 to $20 Million . Which will be stored in a dictionary called 'open_b_one'.
open_b_one = []
for i in group_one_index:open_b_one.append(round_to_multiple(df_4D['Opening_Weekend'][i],1000000))
Counter(open_b_one)
Counter({0: 22,
7000000: 1,
4000000: 1,
16000000: 3,
6000000: 1,
9000000: 1,
5000000: 2,
10000000: 1,
15000000: 1,
21000000: 1,
12000000: 1,
2000000: 1,
1000000: 2})
The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Spring is $30 Million with a Budget of $12,000 to $20 Million.
max(open_b_one)
20874072
The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Spring is $0 with a Budget of $12,000 to $20 Million.
min(open_b_one)
0
#30,000,000-100,000,000 (#10)(20%)
for i in open_b_one:
if 0 <= i <=5000000:print(i)
237264 160547 246914 81006 3762145 63461 36134 118150 124011 46977 0 0 4625583 0 0 0 2215891 446380 4690214 55438 69100 0 0 738339 24286 738339 70188 0
#30,000,000-100,000,000 (#10)(20%)
for i in open_b_one:
if 5000001 <= i <=10000000:print(i)
6661234 6011585 9244641 9851102
#30,000,000-100,000,000 (#10)(20%)
for i in open_b_one:
if 10000001 <= i <=15000000:print(i)
11727390
#30,000,000-100,000,000 (#10)(20%)
for i in open_b_one:
if 15000001 <= i <=20000000:print(i)
16007426 16007426 16015408 15002635
#30,000,000-100,000,000 (#10)(20%)
for i in open_b_one:
if 20000001 <= i :print(i)
20874072
Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Spring with a Budget of $21 Million to $95 Million . Which will be stored in a dictionary called 'open_b_two'.
open_b_two = []
for i in group_two_index:open_b_two.append(round_to_multiple(df_4D['Opening_Weekend'][i],10000000))
Counter(open_b_two)
Counter({10000000: 5, 0: 2, 70000000: 1, 20000000: 4})
The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Spring is $70 Million with a Budget of $21 Million to $95 Million.
max(open_b_two)
67877361
The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Spring is $400,000 with a Budget of $21 Million to $95 Million.
min(open_b_two)
372920
#30,000,000-100,000,000 (#10)(20%)
for i in open_b_two:
if 0 <= i <=20000000:print(i)
10000000 0 10000000 20000000 20000000 20000000 0 10000000 10000000 20000000 10000000
#30,000,000-100,000,000 (#10)(20%)
for i in open_b_two:
if 20000001 <= i :print(i)
70000000
Getting the index of all the movies that are in the Drama Genre that where realesed in Summer, from the 'df_4D' dataframe.
cluster_c_index = []
for i,x in enumerate(df_4D.Season):
if x == 3:cluster_c_index.append(i)
print(cluster_c_index)#showing the cluster_c_index list
[14, 24, 31, 37, 39, 48, 51, 60, 63, 65, 68, 73, 81, 82, 90, 95, 97, 98, 106, 107, 109, 110, 114, 119, 120, 136, 146, 151, 154, 158, 167, 171, 172, 182, 190, 212, 213, 216, 217]
Checking the number of elements in the 'cluster_c_index' list.
len(cluster_c_index)
39
Using the indexes from the 'cluster_c_index' list to get the Season, Budget, Opening Weekend and Month Realesed of each movie that was realesed in Summer.
season_c = []
budget_c = []
open_c = []
month_c = []
for i in cluster_c_index:
season_c.append(df_4D['Season'][i])
budget_c.append(df_4D['Budget'][i])
open_c.append(df_4D['Opening_Weekend'][i])
month_c.append(df_4D['Month_Realesed'][i])
Showing the 'season_c' list.
print(season_c)
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
Showing the 'month_c' list.
print(month_c)
[8, 6, 7, 6, 8, 7, 7, 8, 6, 8, 8, 6, 7, 7, 7, 6, 7, 7, 8, 7, 6, 8, 7, 6, 8, 7, 7, 8, 8, 6, 8, 7, 8, 7, 7, 7, 8, 6, 7]
Showing the 'budget_c' list.
print(budget_c)
[20000000, 10000000, 4000000, 2000000, 2000000, 100000, 20000000, 3000000, 10000000, 3000000, 5000000, 40000000, 32000000, 90000000, 5000000, 7500000, 5000000, 22000000, 23000000, 15000000, 70000000, 30000000, 10000000, 45000000, 858000, 44000000, 33000000, 25000000, 25000000, 20000000, 15000000, 12000000, 11000000, 5000000, 175000, 904765, 34000000, 1000000, 1500000]
Getting the maximum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Summer.
max(budget_c)
90000000
Getting the minimum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Summer.
min(budget_c)
100000
Showing the 'open_c' list.
print(open_c)
[9700000, 13575172, 387618, 84797, 1767308, 104030, 13307125, 11351389, 0, 11351389, 8146533, 13616196, 24517121, 20584908, 2189966, 2534729, 518795, 12146143, 10028065, 7810481, 21037414, 8742545, 220297, 1586753, 0, 12381585, 11731703, 26044590, 12305016, 18723269, 5079566, 5467084, 187281, 21688103, 77740, 0, 11166687, 0, 85709]
Getting the maximum Opening Weekend generated by Drama movies from the 'df_4D' dataframe that were realesed in Summer.
max(open_c)
26044590
Getting the minimum Opening Weekend generated by Drama movies from the 'df_4D' dataframe that were realesed in Summer.
min(open_c)
0
Showing the Frequency of the Repeated Months of the Drama movies from the 'df_4D' dataframe that were realesed in the Summer .
print(Counter(month_c))
Counter({7: 17, 8: 13, 6: 9})
Showing the Frequency of the Repeated Budgets of the Drama movies from the 'df_4D' dataframe that were realesed in the Summer .
print(Counter(budget_c))
Counter({5000000: 4, 20000000: 3, 10000000: 3, 2000000: 2, 3000000: 2, 15000000: 2, 25000000: 2, 4000000: 1, 100000: 1, 40000000: 1, 32000000: 1, 90000000: 1, 7500000: 1, 22000000: 1, 23000000: 1, 70000000: 1, 30000000: 1, 45000000: 1, 858000: 1, 44000000: 1, 33000000: 1, 12000000: 1, 11000000: 1, 175000: 1, 904765: 1, 34000000: 1, 1000000: 1, 1500000: 1})
Using the 'round_to_multiple' function to round the Budget of Drama movies to the nearest 5 Million, that were realesed in Summer .
bud_c_one = []
for i in cluster_c_index:bud_c_one.append(round_to_multiple(df_4D['Budget'][i],5000000))
print(bud_c_one)#showing the bud_c_one list
[20000000, 10000000, 5000000, 0, 0, 0, 20000000, 5000000, 10000000, 5000000, 5000000, 40000000, 30000000, 90000000, 5000000, 10000000, 5000000, 20000000, 25000000, 15000000, 70000000, 30000000, 10000000, 45000000, 0, 45000000, 35000000, 25000000, 25000000, 20000000, 15000000, 10000000, 10000000, 5000000, 0, 0, 35000000, 0, 0]
Showing the Frequency of the Repeated Values of the Budget of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Summer .
print(Counter(bud_c_one))
Counter({0: 8, 5000000: 7, 10000000: 6, 20000000: 4, 25000000: 3, 30000000: 2, 15000000: 2, 45000000: 2, 35000000: 2, 40000000: 1, 90000000: 1, 70000000: 1})
Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Summer amd that has a Budget of $100,000 to $20 Million..
group_one_index = []
for i in cluster_c_index:
if 0 <= df_4D['Budget'][i] <= 20000000:group_one_index.append(i)
print(group_one_index)#showing the group_one_index list
[14, 24, 31, 37, 39, 48, 51, 60, 63, 65, 68, 90, 95, 97, 107, 114, 120, 158, 167, 171, 172, 182, 190, 212, 216, 217]
Checking the number of elements in the 'group_one_index' list.
len(group_one_index)
26
Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Summer amd that has a Budget of $21 Million to $90 Million..
group_two_index = []
for i in cluster_c_index:
if 20000001 <= df_4D['Budget'][i] <= 90000000:group_two_index.append(i)
print(group_two_index)#showing the group_two_index list
[73, 81, 82, 98, 106, 109, 110, 119, 136, 146, 151, 154, 213]
Checking the number of elements in the 'group_two_index' list.
len(group_two_index)
13
Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Summer with a Budget of $100,000 to $20 Million.
open_c_one = []
for i in group_one_index:open_c_one.append(df_4D['Opening_Weekend'][i])
print(open_c_one)#showing the open_c_one list
[9700000, 13575172, 387618, 84797, 1767308, 104030, 13307125, 11351389, 0, 11351389, 8146533, 2189966, 2534729, 518795, 7810481, 220297, 0, 18723269, 5079566, 5467084, 187281, 21688103, 77740, 0, 0, 85709]
Checking the number of elements in the 'open_c_one' list.
len(open_c_one)
26
Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Summer with a Budget of $21 Million to $90 Million.
open_c_two = []
for i in group_two_index:open_c_two.append(df_4D['Opening_Weekend'][i])
print(open_c_two)#showing the open_c_two list
[13616196, 24517121, 20584908, 12146143, 10028065, 21037414, 8742545, 1586753, 12381585, 11731703, 26044590, 12305016, 11166687]
Checking the number of elements in the 'open_c_two' list.
len(open_c_two)
13
Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Summer with a Budget of $100,000 to $20 Million . Which will be stored in a dictionary called 'open_c_one'.
open_c_one = []
for i in group_one_index:open_c_one.append(round_to_multiple(df_4D['Opening_Weekend'][i],1000000))
Counter(open_c_one)
Counter({10000000: 1,
14000000: 1,
0: 11,
2000000: 2,
13000000: 1,
11000000: 2,
8000000: 2,
3000000: 1,
1000000: 1,
19000000: 1,
5000000: 2,
22000000: 1})
The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Summer is $22 Million with a Budget of $100,000 to $20 Million.
max(open_c_one)
21688103
The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Summer is $0 with a Budget of $100,000 to $20 Million.
min(open_c_one)
0
#30,000,000-100,000,000 (#10)(20%)
for i in open_c_one:
if 0 <= i <= 5000000:print(i)
387618 84797 1767308 104030 0 2189966 2534729 518795 220297 0 187281 77740 0 0 85709
#30,000,000-100,000,000 (#10)(20%)
for i in open_c_one:
if 5000001 <= i <=10000000:print(i)
9700000 8146533 7810481 5079566 5467084
#30,000,000-100,000,000 (#10)(20%)
for i in open_c_one:
if 10000001 <= i <=15000000:print(i)
13575172 13307125 11351389 11351389
#30,000,000-100,000,000 (#10)(20%)
for i in open_c_one:
if 15000001 <= i <=20000000:print(i)
18723269
#30,000,000-100,000,000 (#10)(20%)
for i in open_c_one:
if 20000001 <= i :print(i)
21688103
Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Summer with a Budget of $21 Million to $95 Million . Which will be stored in a dictionary called 'open_c_two'.
open_c_two = []
for i in group_two_index:open_c_two.append(round_to_multiple(df['Opening_Weekend'][i],10000000))
collections.Counter(open_c_two)
Counter({10000000: 8, 20000000: 3, 0: 1, 30000000: 1})
The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Summer is $26 Million with a Budget of $21 Million to $95 Million.
max(open_c_two)
26044590
The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Summer is $1 Million with a Budget of $21 Million to $95 Million.
min(open_c_two)
1586753
#30,000,000-100,000,000 (#10)(20%)
for i in open_c_two:
if 0 <= i <=5000000:print(i)
0
#30,000,000-100,000,000 (#10)(20%)
for i in open_c_two:
if 5000000 <= i <=10000000:print(i)
10000000 10000000 10000000 10000000 10000000 10000000 10000000 10000000
#30,000,000-100,000,000 (#10)(20%)
for i in open_c_two:
if 15000001 <= i <=20000000:print(i)
20000000 20000000 20000000
#30,000,000-100,000,000 (#10)(20%)
for i in open_c_two:
if 25000001 <= i <=30000000:print(i)
30000000
Getting the index of all the movies that are in the Drama Genre that where realesed in Autumn, from the 'df_4D' dataframe.
cluster_d_index = []
for i,x in enumerate(df_4D.Season):
if x == 4:cluster_d_index.append(i)
print(cluster_d_index)#showing the cluster_d_index list
[1, 5, 8, 9, 10, 11, 13, 15, 18, 20, 22, 25, 26, 28, 30, 33, 34, 38, 41, 42, 45, 49, 54, 55, 56, 57, 58, 62, 66, 71, 72, 76, 77, 86, 87, 89, 91, 103, 105, 108, 113, 123, 124, 125, 127, 129, 130, 131, 133, 134, 135, 137, 140, 150, 159, 160, 168, 169, 173, 177, 181, 187, 188, 189, 193, 196, 199, 201, 204, 206, 208, 209, 210, 214, 215, 221, 222, 223]
Checking the number of elements in the 'cluster_d_index' list.
len(cluster_d_index)
78
Using the indexes from the 'cluster_d_index' list to get the Season, Budget, Opening Weekend and Month Realesed of each movie that was realesed in Autumn.
season_d = []
budget_d = []
open_d = []
month_d = []
for i in cluster_d_index:
season_d.append(df_4D['Season'][i])
budget_d.append(df_4D['Budget'][i])
open_d.append(df_4D['Opening_Weekend'][i])
month_d.append(df_4D['Month_Realesed'][i])
Showing the 'season_d' list.
print(season_d)
[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]
Showing the 'month_d' list.
print(month_d)
[10, 10, 9, 11, 10, 11, 11, 10, 10, 9, 11, 11, 11, 10, 9, 10, 10, 10, 10, 9, 10, 9, 10, 9, 11, 9, 11, 10, 11, 10, 10, 11, 9, 11, 10, 10, 9, 11, 11, 10, 10, 11, 10, 9, 10, 9, 11, 11, 10, 11, 11, 10, 11, 10, 9, 11, 10, 9, 11, 10, 9, 9, 11, 10, 10, 9, 10, 10, 10, 9, 9, 9, 10, 10, 11, 9, 10, 10]
Showing the 'budget_d' list.
print(budget_d)
[61000000, 55000000, 37500000, 31000000, 23000000, 22500000, 21000000, 20000000, 13000000, 12000000, 11800000, 9400000, 8500000, 5000000, 4750000, 3400000, 3300000, 2000000, 2000000, 1987650, 1000000, 6000000, 11500000, 9000000, 180000000, 37000000, 20000000, 5100000, 20000000, 15000000, 32000000, 30000000, 500000, 15000000, 10000000, 12000000, 7000000, 7000000, 20000000, 2700000, 85000000, 6400000, 13000000, 1750000, 110000000, 60000000, 55000000, 50000000, 50000000, 49000000, 47000000, 40000000, 37000000, 26000000, 20000000, 19000000, 14000000, 13000000, 11000000, 9000000, 6000000, 2000000, 1400000, 250000, 1000000, 1500000, 15000000, 4000000, 4074940, 1000000, 12000000, 15000000, 350000, 230000, 1000000, 15000000, 2200000, 50000]
Getting the maximum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Autumn.
max(budget_d)
180000000
Getting the minimum Budget generated by Drama movies from the 'df_4D' dataframe that were realesed in Autumn.
min(budget_d)
50000
Showing the Frequency of the Repeated Months of the Drama movies from the 'df_4D' dataframe that were realesed in the Autumn .
print(Counter(month_d))
Counter({10: 35, 11: 23, 9: 20})
Showing the Frequency of the Repeated Budgets of the Drama movies from the 'df_4D' dataframe that were realesed in the Autumn .
print(Counter(budget_d))
Counter({20000000: 5, 15000000: 5, 1000000: 4, 13000000: 3, 12000000: 3, 2000000: 3, 55000000: 2, 6000000: 2, 9000000: 2, 37000000: 2, 7000000: 2, 50000000: 2, 61000000: 1, 37500000: 1, 31000000: 1, 23000000: 1, 22500000: 1, 21000000: 1, 11800000: 1, 9400000: 1, 8500000: 1, 5000000: 1, 4750000: 1, 3400000: 1, 3300000: 1, 1987650: 1, 11500000: 1, 180000000: 1, 5100000: 1, 32000000: 1, 30000000: 1, 500000: 1, 10000000: 1, 2700000: 1, 85000000: 1, 6400000: 1, 1750000: 1, 110000000: 1, 60000000: 1, 49000000: 1, 47000000: 1, 40000000: 1, 26000000: 1, 19000000: 1, 14000000: 1, 11000000: 1, 1400000: 1, 250000: 1, 1500000: 1, 4000000: 1, 4074940: 1, 350000: 1, 230000: 1, 2200000: 1, 50000: 1})
Using the 'round_to_multiple' function to round the Budget of Drama movies to the nearest 5 Million, that were realesed in Autumn .
bud_d_one = []
for i in cluster_d_index:bud_d_one.append(round_to_multiple(df_4D['Budget'][i],1000000))
print(bud_d_one)#showing the bud_d_onelist
[61000000, 55000000, 38000000, 31000000, 23000000, 22000000, 21000000, 20000000, 13000000, 12000000, 12000000, 9000000, 8000000, 5000000, 5000000, 3000000, 3000000, 2000000, 2000000, 2000000, 1000000, 6000000, 12000000, 9000000, 180000000, 37000000, 20000000, 5000000, 20000000, 15000000, 32000000, 30000000, 0, 15000000, 10000000, 12000000, 7000000, 7000000, 20000000, 3000000, 85000000, 6000000, 13000000, 2000000, 110000000, 60000000, 55000000, 50000000, 50000000, 49000000, 47000000, 40000000, 37000000, 26000000, 20000000, 19000000, 14000000, 13000000, 11000000, 9000000, 6000000, 2000000, 1000000, 0, 1000000, 2000000, 15000000, 4000000, 4000000, 1000000, 12000000, 15000000, 0, 0, 1000000, 15000000, 2000000, 0]
Showing the Frequency of the Repeated Values of the Budget of the Drama movies from the 'Drama_DataFrame' dataframe that were realesed in the Autumn .
print(Counter(bud_d_one))
Counter({2000000: 7, 20000000: 5, 12000000: 5, 1000000: 5, 15000000: 5, 0: 5, 13000000: 3, 9000000: 3, 5000000: 3, 3000000: 3, 6000000: 3, 55000000: 2, 37000000: 2, 7000000: 2, 50000000: 2, 4000000: 2, 61000000: 1, 38000000: 1, 31000000: 1, 23000000: 1, 22000000: 1, 21000000: 1, 8000000: 1, 180000000: 1, 32000000: 1, 30000000: 1, 10000000: 1, 85000000: 1, 110000000: 1, 60000000: 1, 49000000: 1, 47000000: 1, 40000000: 1, 26000000: 1, 19000000: 1, 14000000: 1, 11000000: 1})
Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Autumn amd that has a Budget of $50,000 to $20 Million..
group_one_index = []
for i in cluster_d_index:
if 0 <= df_4D['Budget'][i] <=20000000:group_one_index.append(i)
print(group_one_index)#showing the group_one_index list
[15, 18, 20, 22, 25, 26, 28, 30, 33, 34, 38, 41, 42, 45, 49, 54, 55, 58, 62, 66, 71, 77, 86, 87, 89, 91, 103, 105, 108, 123, 124, 125, 159, 160, 168, 169, 173, 177, 181, 187, 188, 189, 193, 196, 199, 201, 204, 206, 208, 209, 210, 214, 215, 221, 222, 223]
Checking the number of elements in the 'group_one_index' list.
len(group_one_index)
56
Getting the index of Drama movies from the 'df_4D' dataframe hat were realesed in the Autumn amd that has a Budget of $21 Million to $200 Million..
group_two_index = []
for i in cluster_d_index:
if 20000001 <= df_4D['Budget'][i] :group_two_index.append(i)
print(group_two_index)#showing the group_two_index list
[1, 5, 8, 9, 10, 11, 13, 56, 57, 72, 76, 113, 127, 129, 130, 131, 133, 134, 135, 137, 140, 150]
Checking the number of elements in the 'group_two_index' list.
len(group_two_index)
22
Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Autumn with a Budget of $50,000 to $20 Million.
open_d_one = []
for i in group_one_index:open_d_one.append(df_4D['Opening_Weekend'][i])
print(open_d_one)#showing the open_d_one list
[5100000, 118298, 2002165, 253510, 257174, 256498, 7485546, 52041, 561906, 135388, 156833, 18623, 100268, 137651, 170335, 2337594, 287081, 27547866, 1203011, 27547866, 5268764, 6836036, 1528982, 2739680, 298277, 89054, 2914486, 162146, 0, 0, 0, 0, 4765838, 105005, 76244, 228359, 15679190, 14065500, 4750894, 9112839, 20321, 128140, 0, 85709, 63918, 100316, 100316, 649423, 11014818, 63918, 25775847, 31665, 245398, 63918, 130303, 0]
Checking the number of elements in the 'open_d_one' list.
len(open_d_one)
56
Getting the Opening Weekend of Drama movies from the 'df_4D' dataframe that were realesed in Autumn with a Budget of $21 Million to $200 Million.
open_d_two = []
for i in group_two_index:open_d_two.append(df_4D['Opening_Weekend'][i])
print(open_d_two)#showing the open_d_two list
[37513109, 13143310, 736311, 24900566, 10470145, 492648, 19497324, 11364505, 19152401, 9178233, 9421369, 11457353, 55785112, 22403596, 11947744, 35574710, 220522, 320690, 24074047, 15371203, 29632823, 10003827]
Checking the number of elements in the 'open_d_two' list.
len(open_d_two)
22
Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Autumn with a Budget of $50,000 to $20 Million . Which will be stored in a dictionary called 'open_d_one'.
open_d_one = []
for i in group_one_index:open_d_one.append(round_to_multiple(df_4D['Opening_Weekend'][i],1000000))
Counter(open_d_one)
Counter({5000000: 4,
0: 35,
2000000: 3,
7000000: 2,
1000000: 3,
28000000: 2,
3000000: 2,
16000000: 1,
14000000: 1,
9000000: 1,
11000000: 1,
26000000: 1})
The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Autumn is $28 Million with a Budget of $50,000 to $20 Million.
max(open_d_one)
27547866
The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Autumn is $0 with a Budget of $50,000 to $20 Million.
min(open_d_one)
0
#30,000,000-100,000,000 (#10)(20%)
for i in open_d_one:
if 0 <= i <=5000000:print(i)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#30,000,000-100,000,000 (#10)(20%)
for i in open_d_one:
if 5000001 <= i <=10000000:print(i)
10000000 10000000 10000000 10000000 10000000 10000000 10000000
#30,000,000-100,000,000 (#10)(20%)
for i in open_d_one:
if 15000000 <= i <=20000000:print(i)
20000000
#30,000,000-100,000,000 (#10)(20%)
for i in open_d_one:
if 20000000 <= i <=25000000:print(i)
20000000
#30,000,000-100,000,000 (#10)(20%)
for i in open_d_one:
if 25000001 <= i :print(i)
30000000 30000000 30000000
Showing the Frequency of the Repeated Values of the Opening Weekend of the Drama movies from the 'df_4D' dataframe that were realesed in the Autumn with a Budget of $21 Million to $200 Million . Which will be stored in a dictionary called 'open_d_two'.
open_d_two = []
for i in group_two_index:open_d_two.append(round_to_multiple(df['Opening_Weekend'][i],10000000))
collections.Counter(open_d_two)
Counter({40000000: 2,
10000000: 8,
0: 4,
20000000: 6,
60000000: 1,
30000000: 1})
The maximum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Autumn is $60 Million with a Budget of $21 Million to $200 Million.
max(open_d_two)
55785112
The minimum Opening Weekend of Drama Movies from the 'df_4D' dataframe that were realesed in Autumn is $220,000 with a Budget of $21 Million to $200 Million.
min(open_d_two)
220522
#30,000,000-100,000,000 (#10)(20%)
for i in open_d_two:
if 0 <= i <=10000000:print(i)
10000000 0 10000000 0 10000000 10000000 10000000 10000000 10000000 0 0 10000000
#30,000,000-100,000,000 (#10)(20%)
for i in open_d_two:
if 10000001 <= i <=20000000:print(i)
20000000 20000000 20000000 20000000 20000000 20000000
#30,000,000-100,000,000 (#10)(20%)
for i in open_d_two:
if 20000001 <= i <=30000000:print(i)
30000000
#30,000,000-100,000,000 (#10)(20%)
for i in open_d_two:
if 30000001 <= i <=40000000:print(i)
40000000 40000000
#30,000,000-100,000,000 (#10)(20%)
for i in open_d_two:
if 40000001 <= i :print(i)
60000000
Creating the df_profit_season dataframe.
df_profit_season = pd.DataFrame({'Budget':r_cost+pg_cost+g_cost+pg13_cost+nc17_cost,
'Season':season_r+season_pg+season_g+season_pg13+season_nc17,
"Profit":profit_int+profit_int1+profit_int2+profit_int3+profit_int4,
"Name":name+name1+name2+name3+name4
})
The 'df_profit_season' dataframe. (this dataframe is interactive)
df_profit_season
| Budget | Season | Profit | Name |
|---|---|---|---|
| Loading... (need help?) |
Creating a 3D scatter plot of the Budget, Season and Profit of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimation' libary to create a 3d scatter plot animate object
def animate(i):
# azimuth angle : 0 deg to 360 deg
ax.view_init(elev=10, azim=i*4)
return fig
fig = plt.figure()
ax = Axes3D(fig)
cluster = ax.scatter(df['Budget'],df['Season'],df['Profit'], alpha=0.5,s=50, color='#bd1783')
scatter = ax.set_zlim3d(0,800000000)
cluster = ax.set_xlabel('Budget')
cluster = ax.set_ylabel('Season')
cluster = ax.set_zlabel('Profit')
ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
ani
C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/2738090057.py:9: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6. This is consistent with other Axes classes. ax = Axes3D(fig)
<matplotlib.animation.FuncAnimation at 0x2bfc4e7a370>
Saving the animated 3D scatter plot gif as 'drama8.gif'.
writergif = animation.PillowWriter(fps=30)
ani.save('drama8.gif', fps=10 )
MovieWriter ffmpeg unavailable; using Pillow instead.
The fourth 3D Scatter Plot (part A): the x-axis is the 'Budget', the y-axis is the 'Seson' and the z-axis is the 'Profit'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into clusters. These clusters are based on seasons, the the profit of each clusters will be summed and then will be averaged and will also be analyzed to see which season is the most profitable and most consistant using the Standard Deviation of each Season's Profit.
Creating a 3D scatter plot of the Budget, Season and Profit of the movies that are in the Drama Genre from the 'Data_DataFrame' datarame. Using the 'Animate' function and the 'animation.FuncAnimation' libary to create a 3d scatter plot animate object
def animate(i):
# azimuth angle : 0 deg to 360 deg
ax.view_init(elev=10, azim=i*4)
return fig
fig = plt.figure()
#fig = plt.figure(figsize=(4, 15))
#fig = plt.figure()
ax = Axes3D(fig)
df1 = df[df.Season==1]
df2 = df[df.Season==2]
df3 = df[df.Season==3]
df4 = df[df.Season==4]
x1 = df1['Budget']
y1 = df1['Season']
z1 = df1['Profit']
x2 = df2['Budget']
y2 = df2['Season']
z2 = df2['Profit']
x3 = df3['Budget']
y3 = df3['Season']
z3 = df3['Profit']
x4 = df4['Budget']
y4 = df4['Season']
z4 = df4['Profit']
#ax1 = fig.add_subplot(131, projection='3d')
scatter = ax.scatter(x1,y1,z1, alpha=0.5,s=50, color = '#ed93cd')
scatter = ax.scatter(x2,y2,z2, alpha=0.5,s=50, color = '#a64885')
scatter = ax.scatter(x3,y3,z3, alpha=0.5,s=50, color = '#96276f')
scatter = ax.scatter(x4,y4,z4, alpha=0.5,s=50, color = '#780d53')
scatter = ax.set_zlim3d(0,800000000)
scatter = ax.set_xlabel('Budget')
scatter = ax.set_ylabel('Season')
scatter = ax.set_zlabel('Profit')
ani = animation.FuncAnimation(fig, animate,frames=90, interval=50,blit=False)
ani
C:\Users\rutho\AppData\Local\Temp/ipykernel_24588/1397851132.py:11: MatplotlibDeprecationWarning: Axes3D(fig) adding itself to the figure is deprecated since 3.4. Pass the keyword argument auto_add_to_figure=False and use fig.add_axes(ax) to suppress this warning. The default value of auto_add_to_figure will change to False in mpl3.5 and True values will no longer work in 3.6. This is consistent with other Axes classes. ax = Axes3D(fig)
<matplotlib.animation.FuncAnimation at 0x2bfbb947fa0>
Saving the animated 3D scatter plot gif as 'drama9.gif'.
writergif = animation.PillowWriter(fps=30)
ani.save('drama9.gif', fps=10 )
MovieWriter ffmpeg unavailable; using Pillow instead.
The fourth 3D Scatter Plot (part B): the x-axis is the 'Budget', the y-axis is the 'Seson' and the z-axis is the 'Profit'. The purpose of this animation is to partition the movies in the Drama Genre from the 'Drama_DataFrame' dataframe into clusters. These clusters are based on seasons, the the profit of each clusters will be summed and then will be averaged and will also be analyzed to see which season is the most profitable and most consistant using the Standard Deviation of each Season's Profit.
Getting the Names and Profit of movies that were realesed in the Winter.
profit_1 = []
name_1 = []
for i,x in enumerate(df_profit_season.Season):
if x == 1:profit_1.append(df_profit_season.Profit[i])
for i,x in enumerate(df_profit_season.Season):
if x == 1:name_1.append(df_profit_season.Name[i])
Showing the 'profit_1' list.
print(profit_1)
[349948323, 326398492, 316350619, 82112435, 530998101, 318266710, 7859167, 45178935, 3765283, 12636004, 36954520, 15566240, 1851683, 556082, 10531500, 176601214, 26696000, 35694916, 120587063, 83269971, 118582776, 3101815, 107956187, 21856053, 104285432, 28716963, 71808942, 3851000, 30482317, 55071636, 559454789, 129748880, 129590606, 60143987, 49309093, 217276928, 167618160, 66050951, 117033509, 57917283, 40282881, 36545707, 113955898, 5601987, 20909437, 27087044, 12971021, 23787727, 36699612, 1205034, 13912841, 121165, 13912841, 307113, 13912841, 15566240, 13912841, 34897711]
Checking the number of elements in the 'profit_1' list.
len(profit_1)
58
Showing the 'name_1' list.
print(name_1)
['Django Unchained', 'Fifty Shades Darker', 'Fifty Shades Freed', 'Zero Dark Thirty', 'Fifty Shades of Grey', 'Black Swan', 'If Beale Street Could Talk', 'Quartet', 'We Need to Talk About Kevin', 'Mommy', 'The Witch', 'Blue Valentine', 'Ghost Story', 'Zoot Suit', 'The Lunchbox', 'Little Women', 'The Jazz Singer', 'A Walk to Remember', 'Bridge to Terabithia', "Mr. Holland's Opus", 'Sense and Sensibility', 'The Secret of Roan Inish', 'Forever Young', 'Taps', 'On Golden Pond', 'Absence of Malice', 'Footloose', 'Lassie Come Home', 'The Tale of Despereaux', 'My Fair Lady 1964', 'Sing', 'The Post', 'The Impossible', 'The Rite', 'Collateral Beauty', 'True Grit', 'The Vow', 'Safe Haven', 'Dear John', 'Rings', 'Fences', 'The Roommate', 'The Woman in Black', 'Country Strong', 'Project Almanac', 'Amour', 'Black or White', 'The Bye Bye Man', 'Still Alice', 'Rabbit Hole', 'Shame', 'The Dreamers', 'Shame', 'The Dreamers', 'Shame', 'Blue Valentine', 'Shame', 'Last Tango in Paris']
Checking the number of elements in the 'name_1' list.
len(name_1)
58
Getting the Names and Profit of movies that were realesed in the Spring.
profit_2 = []
name_2 = []
for i,x in enumerate(df_profit_season.Season):
if x == 2:profit_2.append(df_profit_season.Profit[i])
for i,x in enumerate(df_profit_season.Season):
if x == 2:name_2.append(df_profit_season.Name[i])
Showing the 'profit_2' list.
print(profit_2)
[24154026, 8554727, 25358392, 34913, 20251930, 14610760, 88390, 12744931, 156309, 294448, 68711836, 72678948, 447351353, 10948425, 69137047, 62667874, 3835130, 108052686, 3943124, 20000000, 1711143, 58693537, 1250000, 58491516, 293281000, 278014195, 37707417, 10300000, 78809717, 26721826, 29802928, 38984536, 71633833, 4847480, 317522294, 21028230, 40506120, 51603136, 21556959, 29964656, 13945682, 12698355, 4856268, 257845, 659312, 89410061, 256669, 94673038, 401802, 858737]
Checking the number of elements in the 'profit_2' list.
len(profit_2)
50
Showing the 'name_2' list.
print(name_2)
['Priest', 'The Water Diviner', 'Ex Machina', 'Stoker', 'Before Midnight', 'Silent House', 'Locke', 'Unsane', 'Palo Alto', 'Sound of My Voice', 'Fame', 'The Last Song', 'Cinderella', 'Akeelah and the Bee', 'The Last Song', "God's Not Dead", 'The Spanish Prisoner', 'Rocky III', 'Tender Mercies', 'The Natural', 'A Sunday in the Country', 'The Rookie', 'Pollyanna', 'The Rookie', 'The Secret Garden', 'The Sound of Music', "Hachiko: A Dog's Story", 'Three Cions in the Fountain', 'Water for Elephants', 'The Tree of Life', 'The Longest Ride', 'The Age of Adaline', 'The Lucky One', 'Draft Day', 'A Quiet Place', 'Beastly', 'Remember Me', 'Everything, Everything', 'Mud', 'Gifted', 'Before I Fall', 'Ida', 'Matador', 'Tokyo Decadence', 'Wide Sargasso Sea', 'Crash', 'Elles', 'Crash', 'Pink Flamingos', 'Law of Desire']
Checking the number of elements in the 'name_2' list.
len(name_2)
50
Getting the Names and Profit of movies that were realesed in the Summer.
profit_3 = []
name_3 = []
for i,x in enumerate(df_profit_season.Season):
if x == 3:profit_3.append(df_profit_season.Profit[i])
for i,x in enumerate(df_profit_season.Season):
if x == 3:name_3.append(df_profit_season.Name[i])
Showing the 'profit_3' list.
print(profit_3)
[26604054, 60133905, 53273049, 14131551, 8153415, 2669782, 14718173, 70975239, 36918287, 70986904, 33102988, 74830111, 120036382, 81120329, 12815212, 7423752, 544368315, 42892670, 43947950, 12469621, 255500000, 216100000, 7657973, 941214868, 267142000, 4478084, 132552290, 188120004, 41540205, 188265198, 44168692, 11477345, 67356170, 143806510, 1927779, 2548651, 16283563, 8000000, 18912216]
Checking the number of elements in the 'profit_3' list.
len(profit_3)
39
Showing the 'name_3' list.
print(name_3)
['The Debt', 'Hereditary', 'Boyhood', "Winter's Bone", 'We Are Your Friends', 'A Ghost Story', 'Endless Love', 'War Room', 'Urban Cowboy', 'War Room', 'Overcomer', 'The Lake House', 'Phenomenon', 'Contact', 'Honeysuckle Rose', 'The Night the Lights Went Out in Georgia', 'Tex', 'Staying Alive', 'The Little Rascals', 'Ramona and Beezus', 'The Hunchback of Notre Drame', 'Babe', 'Kit Kittredge: An American Girl', 'The Lion King 1994', 'Bambi 1942', 'Charlie St. Cloud', 'Step Up Revolution', 'The Help', 'The Giver', 'Me Before You', 'One Day', 'Wish Upon', 'If I Stay', 'Lights Out', 'Another Earth', 'Arabian Nights', 'Natural Born Killers', 'Beyond the Valley of the Dolls', 'Kids']
Checking the number of elements in the 'name_3' list.
len(name_3)
39
Getting the Names and Profit of movies that were realesed in the Autumn.
profit_4 = []
name_4 = []
for i,x in enumerate(df_profit_season.Season):
if x == 4:profit_4.append(df_profit_season.Profit[i])
for i,x in enumerate(df_profit_season.Season):
if x == 4:name_4.append(df_profit_season.Name[i])
Showing the 'profit_4' list.
print(profit_4)
[307567189, 19966854, 13147416, 129558438, 54735925, 9898681, 17017873, 8270399, 23262783, 23830713, 31043521, 12417298, 69233867, 12499242, 222016, 17033227, 35669037, 9295324, 4328516, 19282640, 4438911, 48766923, 1500000, 2000000, 47784, 59068724, 284604712, 4609597, 285937718, 4344615, 6741732, 34605762, 32973297, 48954968, 5164458, 31440294, 150297525, 11587135, 418656843, 35099643, 58985708, 23794409, 52500000, 5850377, 583698673, 77551594, 35552675, 163591522, 58660270, 22004627, 156127894, 122498338, 136567581, 15059418, 2281732, 57086711, 20044909, 20069303, 51076141, 72831866, 10369708, 33185884, 4152584, 3478400, 8404, 18912216, 52091915, 15465835, 15390895, 1315026, 201120004, 50167430, 2311944, 3664240, 1038916, 50167430, 3546453, 958404]
Checking the number of elements in the 'profit_4' list.
len(profit_4)
78
Showing the 'name_4' list.
print(name_4)
['Gone Girl', 'Crimson Peak', 'The Master', 'Flight', 'The Ides of March', 'Nocturnal Animals', 'For Colored Girls', 'Let Me In', 'Room', 'Arbitrage', 'Carol', 'Melancholia', 'Manchester by the Sea', 'Addicted', 'Take Shelter', 'Margin Call', 'Whiplash', 'The Florida Project', 'Knock Knock', 'Buried', 'Martha Marcy May Marlene', 'Ordinary People', 'Rich and Famous', 'Raggedy Man', 'Hugo', 'Dolphin Tale', 'Wonder', 'Somewhere in Time', 'Wonder', 'Tuck Everlasting', 'Dreamer', 'August Rush', 'Fireproof', 'The Remains of the Day', 'Pure Country', 'A River Runs Through It', 'Resurrection', 'Prancer', 'Beauty and the Beast 1991', 'The Black Stallion', "Charlotte's Web", 'Giant', 'The Ten Commandments 1966', 'The Quiet Man', 'Gravity', 'Contagion', 'Burlesque', 'Creed II', 'Hereafter', 'Anna Karenina', 'Arrival', 'Bridge of Spies', 'Creed', 'The Best of Me', 'The Light Between Oceans', 'The Book Thief', 'Suffragette', 'The Perks of Being a Wallflower', 'Brooklyn', 'Ouija: Origin of Evil', 'The Words', 'Courageous', 'Mustang', 'Like Crazy', 'Whore', 'Kids', 'Lust, Caution', 'Blue Is the Warmest Colour', 'Blue Is the Warmest Colour', 'Two Girls and a Guy', 'Hell', 'Se, jie', 'The Evil Dead', 'Clerks', 'Bad Lieutenant', 'Lust, Caution ', 'Happiness 1998', 'Whore 1991']
Checking the number of elements in the 'name_4' list.
len(name_4)
78
Getting the number of movies that made Profit in the Winter.
sum_1 = []
for i in profit_1:
if i < 0: continue
else: sum_1.append(i)
len(sum_1)
58
Repeating the total amount of Profit genearted by Drama movies in the Winter by the number of movies that made profit.
var1 = []
for i in profit_1: var1.append(sum(sum_1))
print(var1)#showing the var1 list
[5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506, 5027270506]
Getting the number of movies that made Profit in the Spring.
sum_2 = []
for i in profit_2:
if i < 0: continue
else: sum_2.append(i)
len(sum_2)
50
Repeating the total amount of Profit genearted by Drama movies in the Spring by the number of movies that made profit.
var2 = []
for i in profit_2: var2.append(sum(sum_2))
print(var2)#showing the var2 list
[2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541, 2664023541]
Getting the number of movies that made Profit in the Summer.
sum_3 = []
for i in profit_3:
if i < 0: continue
else: sum_3.append(i)
len(sum_3)
39
Repeating the total amount of Profit genearted by Drama movies in the Summer by the number of movies that made profit.
var3 = []
for i in profit_3: var3.append(sum(sum_3))
print(var3)#showing the var3 list
[3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237, 3888623237]
Getting the number of movies that made Profit in the Autumn.
sum_4 = []
for i in profit_4:
if i < 0: continue
else: sum_4.append(i)
len(sum_4)
78
Repeating the total amount of Profit genearted by Drama movies in the Autumn by the number of movies that made profit.
var4 = []
for i in profit_4: var4.append(sum(sum_4))
print(var4)#showing the var4 list
[4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036, 4492301036]
Putting the Names of Drama movies that were realesed in Winter with the carrosponding Profit for the visualizations below.
for x in range(len(name_1)):
print("[ '",name_1[x],"'",',', profit_1[x],'],')
[ ' Django Unchained ' , 349948323 ], [ ' Fifty Shades Darker ' , 326398492 ], [ ' Fifty Shades Freed ' , 316350619 ], [ ' Zero Dark Thirty ' , 82112435 ], [ ' Fifty Shades of Grey ' , 530998101 ], [ ' Black Swan ' , 318266710 ], [ ' If Beale Street Could Talk ' , 7859167 ], [ ' Quartet ' , 45178935 ], [ ' We Need to Talk About Kevin ' , 3765283 ], [ ' Mommy ' , 12636004 ], [ ' The Witch ' , 36954520 ], [ ' Blue Valentine ' , 15566240 ], [ ' Ghost Story ' , 1851683 ], [ ' Zoot Suit ' , 556082 ], [ ' The Lunchbox ' , 10531500 ], [ ' Little Women ' , 176601214 ], [ ' The Jazz Singer ' , 26696000 ], [ ' A Walk to Remember ' , 35694916 ], [ ' Bridge to Terabithia ' , 120587063 ], [ ' Mr. Holland's Opus ' , 83269971 ], [ ' Sense and Sensibility ' , 118582776 ], [ ' The Secret of Roan Inish ' , 3101815 ], [ ' Forever Young ' , 107956187 ], [ ' Taps ' , 21856053 ], [ ' On Golden Pond ' , 104285432 ], [ ' Absence of Malice ' , 28716963 ], [ ' Footloose ' , 71808942 ], [ ' Lassie Come Home ' , 3851000 ], [ ' The Tale of Despereaux ' , 30482317 ], [ ' My Fair Lady 1964 ' , 55071636 ], [ ' Sing ' , 559454789 ], [ ' The Post ' , 129748880 ], [ ' The Impossible ' , 129590606 ], [ ' The Rite ' , 60143987 ], [ ' Collateral Beauty ' , 49309093 ], [ ' True Grit ' , 217276928 ], [ ' The Vow ' , 167618160 ], [ ' Safe Haven ' , 66050951 ], [ ' Dear John ' , 117033509 ], [ ' Rings ' , 57917283 ], [ ' Fences ' , 40282881 ], [ ' The Roommate ' , 36545707 ], [ ' The Woman in Black ' , 113955898 ], [ ' Country Strong ' , 5601987 ], [ ' Project Almanac ' , 20909437 ], [ ' Amour ' , 27087044 ], [ ' Black or White ' , 12971021 ], [ ' The Bye Bye Man ' , 23787727 ], [ ' Still Alice ' , 36699612 ], [ ' Rabbit Hole ' , 1205034 ], [ ' Shame ' , 13912841 ], [ ' The Dreamers ' , 121165 ], [ ' Shame ' , 13912841 ], [ ' The Dreamers ' , 307113 ], [ ' Shame ' , 13912841 ], [ ' Blue Valentine ' , 15566240 ], [ ' Shame ' , 13912841 ], [ ' Last Tango in Paris ' , 34897711 ],
Putting the Names of Drama movies that were realesed in Spring with the carrosponding Profit for the visualizations below.
for x in range(len(name_2)):
print("[ '",name_2[x],"'",',', profit_2[x],'],')
[ ' Priest ' , 24154026 ], [ ' The Water Diviner ' , 8554727 ], [ ' Ex Machina ' , 25358392 ], [ ' Stoker ' , 34913 ], [ ' Before Midnight ' , 20251930 ], [ ' Silent House ' , 14610760 ], [ ' Locke ' , 88390 ], [ ' Unsane ' , 12744931 ], [ ' Palo Alto ' , 156309 ], [ ' Sound of My Voice ' , 294448 ], [ ' Fame ' , 68711836 ], [ ' The Last Song ' , 72678948 ], [ ' Cinderella ' , 447351353 ], [ ' Akeelah and the Bee ' , 10948425 ], [ ' The Last Song ' , 69137047 ], [ ' God's Not Dead ' , 62667874 ], [ ' The Spanish Prisoner ' , 3835130 ], [ ' Rocky III ' , 108052686 ], [ ' Tender Mercies ' , 3943124 ], [ ' The Natural ' , 20000000 ], [ ' A Sunday in the Country ' , 1711143 ], [ ' The Rookie ' , 58693537 ], [ ' Pollyanna ' , 1250000 ], [ ' The Rookie ' , 58491516 ], [ ' The Secret Garden ' , 293281000 ], [ ' The Sound of Music ' , 278014195 ], [ ' Hachiko: A Dog's Story ' , 37707417 ], [ ' Three Cions in the Fountain ' , 10300000 ], [ ' Water for Elephants ' , 78809717 ], [ ' The Tree of Life ' , 26721826 ], [ ' The Longest Ride ' , 29802928 ], [ ' The Age of Adaline ' , 38984536 ], [ ' The Lucky One ' , 71633833 ], [ ' Draft Day ' , 4847480 ], [ ' A Quiet Place ' , 317522294 ], [ ' Beastly ' , 21028230 ], [ ' Remember Me ' , 40506120 ], [ ' Everything, Everything ' , 51603136 ], [ ' Mud ' , 21556959 ], [ ' Gifted ' , 29964656 ], [ ' Before I Fall ' , 13945682 ], [ ' Ida ' , 12698355 ], [ ' Matador ' , 4856268 ], [ ' Tokyo Decadence ' , 257845 ], [ ' Wide Sargasso Sea ' , 659312 ], [ ' Crash ' , 89410061 ], [ ' Elles ' , 256669 ], [ ' Crash ' , 94673038 ], [ ' Pink Flamingos ' , 401802 ], [ ' Law of Desire ' , 858737 ],
Putting the Names of Drama movies that were realesed in Summer with the carrosponding Profit for the visualizations below.
for x in range(len(name_3)):
print("[ '",name_3[x],"'",',', profit_3[x],'],')
[ ' The Debt ' , 26604054 ], [ ' Hereditary ' , 60133905 ], [ ' Boyhood ' , 53273049 ], [ ' Winter's Bone ' , 14131551 ], [ ' We Are Your Friends ' , 8153415 ], [ ' A Ghost Story ' , 2669782 ], [ ' Endless Love ' , 14718173 ], [ ' War Room ' , 70975239 ], [ ' Urban Cowboy ' , 36918287 ], [ ' War Room ' , 70986904 ], [ ' Overcomer ' , 33102988 ], [ ' The Lake House ' , 74830111 ], [ ' Phenomenon ' , 120036382 ], [ ' Contact ' , 81120329 ], [ ' Honeysuckle Rose ' , 12815212 ], [ ' The Night the Lights Went Out in Georgia ' , 7423752 ], [ ' Tex ' , 544368315 ], [ ' Staying Alive ' , 42892670 ], [ ' The Little Rascals ' , 43947950 ], [ ' Ramona and Beezus ' , 12469621 ], [ ' The Hunchback of Notre Drame ' , 255500000 ], [ ' Babe ' , 216100000 ], [ ' Kit Kittredge: An American Girl ' , 7657973 ], [ ' The Lion King 1994 ' , 941214868 ], [ ' Bambi 1942 ' , 267142000 ], [ ' Charlie St. Cloud ' , 4478084 ], [ ' Step Up Revolution ' , 132552290 ], [ ' The Help ' , 188120004 ], [ ' The Giver ' , 41540205 ], [ ' Me Before You ' , 188265198 ], [ ' One Day ' , 44168692 ], [ ' Wish Upon ' , 11477345 ], [ ' If I Stay ' , 67356170 ], [ ' Lights Out ' , 143806510 ], [ ' Another Earth ' , 1927779 ], [ ' Arabian Nights ' , 2548651 ], [ ' Natural Born Killers ' , 16283563 ], [ ' Beyond the Valley of the Dolls ' , 8000000 ], [ ' Kids ' , 18912216 ],
Putting the Names of Drama movies that were realesed in Autumn with the carrosponding Profit for the visualizations below.
for x in range(len(name_4)):
print("[ '",name_4[x],"'",',', profit_4[x],'],')
[ ' Gone Girl ' , 307567189 ], [ ' Crimson Peak ' , 19966854 ], [ ' The Master ' , 13147416 ], [ ' Flight ' , 129558438 ], [ ' The Ides of March ' , 54735925 ], [ ' Nocturnal Animals ' , 9898681 ], [ ' For Colored Girls ' , 17017873 ], [ ' Let Me In ' , 8270399 ], [ ' Room ' , 23262783 ], [ ' Arbitrage ' , 23830713 ], [ ' Carol ' , 31043521 ], [ ' Melancholia ' , 12417298 ], [ ' Manchester by the Sea ' , 69233867 ], [ ' Addicted ' , 12499242 ], [ ' Take Shelter ' , 222016 ], [ ' Margin Call ' , 17033227 ], [ ' Whiplash ' , 35669037 ], [ ' The Florida Project ' , 9295324 ], [ ' Knock Knock ' , 4328516 ], [ ' Buried ' , 19282640 ], [ ' Martha Marcy May Marlene ' , 4438911 ], [ ' Ordinary People ' , 48766923 ], [ ' Rich and Famous ' , 1500000 ], [ ' Raggedy Man ' , 2000000 ], [ ' Hugo ' , 47784 ], [ ' Dolphin Tale ' , 59068724 ], [ ' Wonder ' , 284604712 ], [ ' Somewhere in Time ' , 4609597 ], [ ' Wonder ' , 285937718 ], [ ' Tuck Everlasting ' , 4344615 ], [ ' Dreamer ' , 6741732 ], [ ' August Rush ' , 34605762 ], [ ' Fireproof ' , 32973297 ], [ ' The Remains of the Day ' , 48954968 ], [ ' Pure Country ' , 5164458 ], [ ' A River Runs Through It ' , 31440294 ], [ ' Resurrection ' , 150297525 ], [ ' Prancer ' , 11587135 ], [ ' Beauty and the Beast 1991 ' , 418656843 ], [ ' The Black Stallion ' , 35099643 ], [ ' Charlotte's Web ' , 58985708 ], [ ' Giant ' , 23794409 ], [ ' The Ten Commandments 1966 ' , 52500000 ], [ ' The Quiet Man ' , 5850377 ], [ ' Gravity ' , 583698673 ], [ ' Contagion ' , 77551594 ], [ ' Burlesque ' , 35552675 ], [ ' Creed II ' , 163591522 ], [ ' Hereafter ' , 58660270 ], [ ' Anna Karenina ' , 22004627 ], [ ' Arrival ' , 156127894 ], [ ' Bridge of Spies ' , 122498338 ], [ ' Creed ' , 136567581 ], [ ' The Best of Me ' , 15059418 ], [ ' The Light Between Oceans ' , 2281732 ], [ ' The Book Thief ' , 57086711 ], [ ' Suffragette ' , 20044909 ], [ ' The Perks of Being a Wallflower ' , 20069303 ], [ ' Brooklyn ' , 51076141 ], [ ' Ouija: Origin of Evil ' , 72831866 ], [ ' The Words ' , 10369708 ], [ ' Courageous ' , 33185884 ], [ ' Mustang ' , 4152584 ], [ ' Like Crazy ' , 3478400 ], [ ' Whore ' , 8404 ], [ ' Kids ' , 18912216 ], [ ' Lust, Caution ' , 52091915 ], [ ' Blue Is the Warmest Colour ' , 15465835 ], [ ' Blue Is the Warmest Colour ' , 15390895 ], [ ' Two Girls and a Guy ' , 1315026 ], [ ' Hell ' , 201120004 ], [ ' Se, jie ' , 50167430 ], [ ' The Evil Dead ' , 2311944 ], [ ' Clerks ' , 3664240 ], [ ' Bad Lieutenant ' , 1038916 ], [ ' Lust, Caution ' , 50167430 ], [ ' Happiness 1998 ' , 3546453 ], [ ' Whore 1991 ' , 958404 ],
Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_profit_season' dataframe that were realesed in the Winter. Which will be stored in a dictionary called 'winter_group'.
winter_group = []
for i in profit_1:winter_group.append(round_to_multiple(i,10000000))
Counter(winter_group)
Counter({350000000: 1,
330000000: 1,
320000000: 2,
80000000: 2,
530000000: 1,
10000000: 9,
50000000: 2,
0: 8,
40000000: 5,
20000000: 5,
180000000: 1,
30000000: 5,
120000000: 3,
110000000: 2,
100000000: 1,
70000000: 2,
60000000: 3,
560000000: 1,
130000000: 2,
220000000: 1,
170000000: 1})
The maximum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Winter is $600 Million.
max(profit_1)
559454789
The minimum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Winter is $120,000.
min(profit_1)
121165
#30,000,000-100,000,000 (#10)(20%)
for i in profit_1:
if 0 <= i <=50000000:print(i)
7859167 45178935 3765283 12636004 36954520 15566240 1851683 556082 10531500 26696000 35694916 3101815 21856053 28716963 3851000 30482317 49309093 40282881 36545707 5601987 20909437 27087044 12971021 23787727 36699612 1205034 13912841 121165 13912841 307113 13912841 15566240 13912841 34897711
#30,000,000-100,000,000 (#10)(20%)
for i in profit_1:
if 50000000 <= i <=100000000:print(i)
82112435 83269971 71808942 55071636 60143987 66050951 57917283
#30,000,000-100,000,000 (#10)(20%)
for i in profit_1:
if 100000000<= i <=200000000:print(i)
176601214 120587063 118582776 107956187 104285432 129748880 129590606 167618160 117033509 113955898
#30,000,000-100,000,000 (#10)(20%)
for i in profit_1:
if 200000000<= i <=300000000:print(i)
217276928
#30,000,000-100,000,000 (#10)(20%)
for i in profit_1:
if 300000000<= i <=400000000:print(i)
349948323 326398492 316350619 318266710
#30,000,000-100,000,000 (#10)(20%)
for i in profit_1:
if 400000000<= i <=600000000:print(i)
530998101 559454789
Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_profit_season' dataframe that were realesed in the Spring. Which will be stored in a dictionary called 'spring_group'.
spring_group = []
for i in profit_2:spring_group.append(round_to_multiple(i,10000000))
Counter(spring_group)
Counter({20000000: 5,
10000000: 7,
30000000: 4,
0: 15,
70000000: 4,
450000000: 1,
60000000: 3,
110000000: 1,
290000000: 1,
280000000: 1,
40000000: 3,
80000000: 1,
320000000: 1,
50000000: 1,
90000000: 2})
The maximum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Spring is $450 Million.
max(profit_2)
447351353
The minimum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Spring is $35,000.
min(profit_2)
34913
#30,000,000-100,000,000 (#10)(20%)
for i in profit_2:
if 0 <= i <=50000000:print(i)
24154026 8554727 25358392 34913 20251930 14610760 88390 12744931 156309 294448 10948425 3835130 3943124 20000000 1711143 1250000 37707417 10300000 26721826 29802928 38984536 4847480 21028230 40506120 21556959 29964656 13945682 12698355 4856268 257845 659312 256669 401802 858737
#30,000,000-100,000,000 (#10)(20%)
for i in profit_2:
if 50000000 <= i <=100000000:print(i)
68711836 72678948 69137047 62667874 58693537 58491516 78809717 71633833 51603136 89410061 94673038
#30,000,000-100,000,000 (#10)(20%)
for i in profit_2:
if 100000000<= i <=200000000:print(i)
108052686
#30,000,000-100,000,000 (#10)(20%)
for i in profit_2:
if 200000000<= i <=300000000:print(i)
293281000 278014195
#30,000,000-100,000,000 (#10)(20%)
for i in profit_2:
if 300000000<= i <=400000000:print(i)
317522294
#30,000,000-100,000,000 (#10)(20%)
for i in profit_2:
if 400000000<= i :print(i)
447351353
Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_profit_season' dataframe that were realesed in the Summer. Which will be stored in a dictionary called 'summer_group'.
summer_group = []
for i in profit_3:summer_group.append(round_to_multiple(i,10000000))
Counter(summer_group)
Counter({30000000: 2,
60000000: 1,
50000000: 1,
10000000: 9,
0: 4,
70000000: 4,
40000000: 5,
120000000: 1,
80000000: 1,
540000000: 1,
260000000: 1,
220000000: 1,
940000000: 1,
270000000: 1,
130000000: 1,
190000000: 2,
140000000: 1,
20000000: 2})
The maximum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Summer is $1 Billion.
max(profit_3)
941214868
The minimum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Summer is $2 Million.
min(profit_3)
1927779
#30,000,000-100,000,000 (#10)(20%)
for i in profit_3:
if 0 <= i <=50000000:print(i)
26604054 14131551 8153415 2669782 14718173 36918287 33102988 12815212 7423752 42892670 43947950 12469621 7657973 4478084 41540205 44168692 11477345 1927779 2548651 16283563 8000000 18912216
#30,000,000-100,000,000 (#10)(20%)
for i in profit_3:
if 50000000 <= i <=100000000:print(i)
60133905 53273049 70975239 70986904 74830111 81120329 67356170
#30,000,000-100,000,000 (#10)(20%)
for i in profit_3:
if 100000000 <= i <=200000000:print(i)
120036382 132552290 188120004 188265198 143806510
#30,000,000-100,000,000 (#10)(20%)
for i in profit_3:
if 200000000 <= i <=300000000:print(i)
255500000 216100000 267142000
#30,000,000-100,000,000 (#10)(20%)
for i in profit_3:
if 500000000 <= i <=600000000:print(i)
544368315
#30,000,000-100,000,000 (#10)(20%)
for i in profit_3:
if 800000000 <= i :print(i)
941214868
Showing the Frequency of the Repeated Values of the Profit of the Drama movies from the 'df_profit_season' dataframe that were realesed in the Autumn. Which will be stored in a dictionary called 'autumn_group'.
autumn_group = []
for i in profit_4:autumn_group.append(round_to_multiple(i,50000000))
collections.Counter(autumn_group)
Counter({300000000: 3,
0: 43,
150000000: 5,
50000000: 22,
400000000: 1,
600000000: 1,
100000000: 2,
200000000: 1})
The maximum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Autumn is $600 Million.
max(profit_4)
583698673
The minimum Profit of Drama Movies from the 'df_profit_season' dataframe that were realesed in Autumn is $8,404.
min(profit_4)
8404
#30,000,000-100,000,000 (#10)(20%)
for i in profit_4:
if 0 <= i <=50000000:print(i)
19966854 13147416 9898681 17017873 8270399 23262783 23830713 31043521 12417298 12499242 222016 17033227 35669037 9295324 4328516 19282640 4438911 48766923 1500000 2000000 47784 4609597 4344615 6741732 34605762 32973297 48954968 5164458 31440294 11587135 35099643 23794409 5850377 35552675 22004627 15059418 2281732 20044909 20069303 10369708 33185884 4152584 3478400 8404 18912216 15465835 15390895 1315026 2311944 3664240 1038916 3546453 958404
#30,000,000-100,000,000 (#10)(20%)
for i in profit_4:
if 50000000 <= i <=100000000:print(i)
54735925 69233867 59068724 58985708 52500000 77551594 58660270 57086711 51076141 72831866 52091915 50167430 50167430
#30,000,000-100,000,000 (#10)(20%)
for i in profit_4:
if 100000000 <= i <=200000000:print(i)
129558438 150297525 163591522 156127894 122498338 136567581
#30,000,000-100,000,000 (#10)(20%)
for i in profit_4:
if 200000000 <= i <=300000000:print(i)
284604712 285937718 201120004
#30,000,000-100,000,000 (#10)(20%)
for i in profit_3:
if 500000000 <= i <=600000000:print(i)
544368315
#30,000,000-100,000,000 (#10)(20%)
for i in profit_3:
if 600000000 <= i :print(i)
941214868
![]() | ![]() | ||
![]() | ![]() | ||
![]() | ![]() | ||
![]() | ![]() | ||
![]() | ![]() |
%%js
Highcharts.chart('container111', {
chart: {
type: 'bar',
width: 800,
height: 400
},
title: {
text: 'Group A: The revenue of movies that were released between the month of Janurary to July'
},
subtitle: {
text: 'Group one: The budget used for each movie are between 42 to 100 million'
},
xAxis: {
categories: ['50-80 Million','80-100 Million', '100-150 Million', '150-200 Million', '200-250 Million',
'250-350 Million', '350-460 Million'],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Revenue: (millions)|Amount of Movies: 41',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#C41E3A",
data: [
{
name: "50-80 Million",
y: 7,
drilldown: "50-80 Million"
},
{
name: "80-100 Million",
y: 12,
drilldown: "80-100 Million"
},
{
name: "100-150 Million",
y: 24,
drilldown: "100-150 Million"
},
{
name: "150-200 Million",
y: 20,
drilldown: "150-200 Million"
},
{
name: "200-250 Million",
y: 5,
drilldown: "200-250 Million"
},
{
name: "250-350 Million",
y: 15,
drilldown: "250-350 Million"
},
{
name: "350-460 Million",
y: 17,
drilldown: "350-460 Million"
},
]
}
]
});
%%js
Highcharts.chart('container222', {
chart: {
type: 'bar',
width: 800,
height: 400
},
title:{text:''},
subtitle: {
text: 'Group two: The budget used for each movie are between 100 to 410 million'
},
xAxis: {
categories: ['100-150 Million', '150-200 Million', '200-250 Million', '250-350 Million',
'350-450 Million', '450-500 Million', '500-650 Million', '650-800 Million',
'800 Million -1.5 Billion'],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Revenue: (millions-billions)|Amount of Movies: 51',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#C41E3A",
data: [
{
name: "100-150 Million",
y: 2,
drilldown: "100-150 Million"
},
{
name: "150-200 Million",
y: 4,
drilldown: "150-200 Million"
},
{
name: "200-250 Million",
y: 19,
drilldown: "200-250 Million"
},
{
name: "250-350 Million",
y: 6,
drilldown: "250-350 Million"
},
{
name: "350-450 Million",
y: 15,
drilldown: "350-450 Million"
},
{
name: "450-500 Million",
y: 9,
drilldown: "450-500 Million"
},
{
name: "500-650 Million",
y: 13,
drilldown: "500-650 Million"
},
{
name: "650-800 Million",
y: 11,
drilldown: "650-800 Million"
},
{
name:"800 Million-1.5 Billion",
y: 21,
drilldown: "800 Million-1.5 Billion"
},
]
}
]
});
%%js
Highcharts.chart('container333', {
chart: {
type: 'bar',
width: 800,
height: 400
},
title: {
text: 'Group B: The revenue of movies that were released between the month of August to December'
},
subtitle: {
text: 'Group one: The budget used for each movie are between 50 to 100 million'
},
xAxis: {
categories: ['60-80 Million', '80-100 Million','100-150 Million', '150-200 Million', '200-250 Million',
'250-350 Million', '350-450 Million', '450-520 Million'],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Revenue: (millions)|Total of Movies: 48',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#702963",
data: [
{
name: "60-80 Million",
y: 10,
drilldown: "60-80 Million"
},
{
name: "80-100 Million",
y: 8,
drilldown: "80-100 Million"
},
{
name: "100-150 Million",
y: 15,
drilldown: "100-150 Million"
},
{
name: "150-200 Million",
y: 21,
drilldown: "150-200 Million"
},
{
name: "200-250 Million",
y: 15,
drilldown: "200-250 Million"
},
{
name: "250-350 Million",
y: 15,
drilldown: "250-350 Million"
},
{
name: "350-450 Million",
y: 10,
drilldown: "350-450 Million"
},
{
name:"450-520 Million",
y: 6,
drilldown: "450-520 Million"
},
]
}
]
});
%%js
Highcharts.chart('container444', {
chart: {
type: 'bar',
width: 800,
height: 400
},
title: {
text: ''
},
subtitle: {
text: 'Group two: The budget used for each movie are between 100 to 350 million'
},
xAxis: {
categories: [ '100-150 Million', '150-200 Million','200-250 Million', '300-350 Million',
'350-450 Million', '550-650 Million', '650-800 Million', '800 Million-2.2 Billion'],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Revenue: (millions-billions)|Total of Movies: 24',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#702963",
data: [{
name: "100-150 Million",
y: 8,
drilldown: "100-150 Million"
},
{
name: "150-200 Million",
y: 12,
drilldown: "150-200 Million"
},
{
name: "200-250 Million",
y: 4,
drilldown: "200-250 Million"
},
{
name: "300-350 Million",
y: 4,
drilldown: "300-350 Million"
},
{
name: "350-450 Million",
y: 17,
drilldown: "350-450 Million"
},
{
name: "550-650 Million",
y: 8,
drilldown: "550-650 Million"
},
{
name: "650-800 Million",
y: 21,
drilldown: "650-800 Million"
},
{
name:"800 Million-2.2 Billion",
y: 25,
drilldown: "800 Million-2.2 Billion"
},
]
}
]
});
%%js
Highcharts.chart('container555', {
chart: {
type: 'bar',
width: 800,
height: 400
},
title: {
text: 'Season A: The profit of movies that were released in the Winter and Summer '
},
subtitle: {
text: 'Group one: The Opening Weekend of each movie that are between 4 to 50 million'
},
xAxis: {
categories: ['10-50 Million','50-100 Million', '100-150 Million', '150-200 Million', '200-250 Million', '250-300 Million','300-450 Million', '2 Billion'],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Profit: (millions-billions)|Amount of Movies: 67',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#ff4500",
data: [{
name: "10-50 Million",
y: 20,
drilldown: "10-50 Million"
},
{
name: "50-100 Million",
y: 27,
drilldown: "50-100 Million"
},
{
name: "100-150 Million",
y: 19,
drilldown: "100-150 Million"
},
{
name: "150-200 Million",
y: 10,
drilldown: "150-200 Million"
},
{
name: "200-250 Million",
y: 8,
drilldown: "200-250 Million"
},
{
name: "250-300 Million",
y: 6,
drilldown: "250-300 Million"
},
{
name: "300-450 Million",
y: 6,
drilldown: "300-450 Million"
},
{
name: "2 Billion",
y: 2,
drilldown: "2 Billion"
},
]
}
]
});
%%js
Highcharts.chart('container666', {
chart: {
type: 'bar',
width: 800,
height: 400
},
title: {
text: ' '
},
subtitle: {
text: 'Group two: The Opening Weekend of each movie that are between 50 to 250 million'
},
xAxis: {
categories: ['200-300 Million', '300-350 Million', '400-450 Million', '500-550 Million', '700-800 Million', '800-900 Million',
'900 Million-2 Billion'],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Profit: (millions-billions)|Amount of Movies: 21',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#ff4500",
data: [
{
name: "200-300 Million",
y: 24,
drilldown: "200-300 Million"
},
{
name: "300-350 Million",
y: 24,
drilldown: "300-350 Million"
},
{
name: "400-450 Million",
y: 10,
drilldown: "400-450 Million"
},
{
name: "500-550 Million",
y: 10,
drilldown: "500-550 Million"
},
{
name: "700-800 Million",
y: 10,
drilldown: "700-800 Million"
},
{
name: "800-900 Million",
y: 10,
drilldown: "800-900 Million"
},
{
name: "900 Million -2 Billion",
y: 14,
drilldown: "900 Million-2 Billion"
},
]
}
]
});
%%js
Highcharts.chart('container777', {
chart: {
type: 'bar',
width: 800,
height: 400
},
title: {
text: 'Season B: The profit of movies that were released in the Spring and Autumn'
},
subtitle: {
text: 'Group one: The Opening Weekend of each movie that are between 10 to 50 million'
},
xAxis: {
categories: ['10-50 Million', '50-100 Million','100-150 Million','150-200 Million',
'200-250 Million','250-300 Million', '300-450 Million', '550 Million',
'1.3 Billion'],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Profit: (millions-billions)|Amount of Movies: 51',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#960018 ",
data: [
{
name: "10-50 Million",
y: 33,
drilldown: "10-50 Million"
},
{
name: "50-100 Million",
y: 10,
drilldown: "50-100 Million"
},
{
name: "100-150 Million",
y: 20,
drilldown: "100-150 Million"
},
{
name: "150-200 Million",
y: 12,
drilldown: "150-200 Million"
},
{
name: "200-250 Million",
y: 2,
drilldown: "200-250 Million"
},
{
name: "250-300 Million",
y: 6,
drilldown: "250-300 Million"
},
{
name: "300-450 Million",
y: 14,
drilldown: "300-450 Million"
},
{
name: "550 Million",
y: 2,
drilldown: "550 Million"
},
{
name: "1.3 Billion",
y: 2,
drilldown: "1.3 Billion"
},
]
}
]
});
%%js
Highcharts.chart('container888', {
chart: {
type: 'bar',
width: 800,
height: 400
},
title: {
text: ' '
},
subtitle: {
text: 'Group Two: The Opening Weekend of each movie that are between 50 to 380 million'
},
xAxis: {
categories: [ '100-200 Million','200-340 Million', '350-500 Million',
'500-550 Million', '550-650 Million','650-750 Million',
'1.1 Billion'],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Profit: (millions-billions)|Amount of Movies: 24',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#960018",
data: [{
name: "100-200 Million",
y: 13,
drilldown: "100-200 Million"
},
{
name: "200-340 Million",
y: 21,
drilldown: "200-340 Million"
},
{
name: "350-500 Million",
y: 25,
drilldown: "350-500 Million"
},
{
name: "500-550 Million",
y: 8,
drilldown: "500-550 Million"
},
{
name: "550-650 Million",
y: 21,
drilldown: "550-650 Million"
},
{
name: "650-750 Million",
y: 12,
drilldown: "650-750 Million"
},
{
name: "1.1 Billion",
y: 4,
drilldown: "1.1 Billion"
},
]
}
]
});
%%js
Highcharts.chart('container999', {
chart: {
type: 'bar',
width: 400,
height: 400
},
title: {
text: 'Winter: The opening weekend of movies that were released in winter'
},
subtitle: {
text: 'Group one: The budget of each movie that are between 50 to 80 million'
},
xAxis: {
categories: ['3.5 Million', '10-15 Million','15-20 Million','20-32 Million',],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Opening Weekend: (millions)|Amount of Movies: 16',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#960018 ",
data: [
{
name: "3.5 Million",
y: 6,
drilldown: "3.5 Million"
},
{
name: "10-15 Million",
y: 10,
drilldown: "10-15 Million"
},
{
name: "15-20 Million",
y: 20,
drilldown: "15-20 Million"
},
{
name: "20-30 Million",
y: 12,
drilldown: "20-30 Million"
},
]
}
]
});
%%js
Highcharts.chart('container991', {
chart: {
type: 'bar',
width: 400,
height: 400
},
title: {
text: ''
},
subtitle: {
text: 'Group two: The Budget of each movie that are between 80 to 320 million'
},
xAxis: {
categories: ['8-10 Million', '10-15 Million','20-30 Million','30-40 Million',
'50-70 Million'],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Opening Weekend: (millions)|Amount of Movies: 18',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#960018 ",
data: [
{
name: "8-10 Million",
y: 11,
drilldown: "8-10 Million"
},
{
name: "10-15 Million",
y: 22,
drilldown: "10-15 Million"
},
{
name: "20-30 Million",
y: 28,
drilldown: "20-30 Million"
},
{
name: "30-40 Million",
y: 11,
drilldown: "30-40 Million"
},
{
name: "50-70 Million",
y: 17,
drilldown: "50-70 Million"
},
]
}
]
});
%%js
Highcharts.chart('container992', {
chart: {
type: 'bar',
width: 400,
height: 400
},
title: {
text: 'Spring: The opening weekend of movies that were released in spring'
},
subtitle: {
text: 'Group one: The Budget of each movie that are between 50 to 80 million'
},
xAxis: {
categories: ['3-6 Million', '10-15 Million','15-20 Million','20-30 Million','30-40 Million',
],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Opening Weekend: (millions)|Amount of Movies: 12',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#960018 ",
data: [{
name: "3.6 Million",
y: 8,
drilldown: "3.6 Million"
},
{
name: "10-15 Million",
y: 25,
drilldown: "10-15 Million"
},
{
name: "15-20 Million",
y: 17,
drilldown: "15-20 Million"
},
{
name: "20-30 Million",
y: 33,
drilldown: "20-30 Million"
},
{
name: "30-40 Million",
y: 17,
drilldown: "30-40 Million"
},
]
}
]
});
%%js
Highcharts.chart('container993', {
chart: {
type: 'bar',
width: 400,
height: 400
},
title:{text:''},
subtitle: {
text: 'Group two: The Budget of each movie that are between 80 to 400 million'
},
xAxis: {
categories: ['6-20 Million','20-30 Million','30-40 Million',
'40-70 Million', '70-90 Million','90-100 Million'],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Opening Weekend: (millions)|Amount of Movies: 33',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#960018 ",
data: [
{
name: "6-20 Million",
y: 12,
drilldown: "6-20 Million"
},
{
name: "20-30 Million",
y: 21,
drilldown: "20-30 Million"
},
{
name: "30-40 Million",
y: 12,
drilldown: "30-40 Million"
},
{
name: "40-70 Million",
y: 21,
drilldown: "40-70 Million"
},
{
name: "70-90 Million",
y: 12,
drilldown: "70-90 Million"
},
{
name: "90-100 Million",
y: 6,
drilldown: "90-100 Million"
},
]
}
]
});
%%js
Highcharts.chart('container994', {
chart: {
type: 'bar',
width: 400,
height: 400
},
title: {
text: 'Summer: The opening weekend of movies that were released in the summer'
},
subtitle: {
text: 'Group one: The Budget of each movie that are between 50 to 90 million'
},
xAxis: {
categories: [ '10-15 Million','15-20 Million','20-30 Million','30-40 Million','50-70 Million'
],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Opening Weekend: (millions)|Amount of Movies: 28',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#960018 ",
data: [
{
name: "10-15 Million",
y: 14,
drilldown: "10-15 Million"
},
{
name: "15-20 Million",
y: 21,
drilldown: "15-20 Million"
},
{
name: "20-30 Million",
y: 32,
drilldown: "20-30 Million"
},
{
name: "30-40 Million",
y: 11,
drilldown: "30-40 Million"
},
{
name: "50-70 Million",
y: 11,
drilldown: "50-70 Million"
},
]
}
]
});
%%js
Highcharts.chart('container995', {
chart: {
type: 'bar',
width: 400,
height: 400
},
title:{text:''},
subtitle: {
text: 'Group two: The Budget of each movie that are between 90 to 200 million'
},
xAxis: {
categories: ['9-20 Million','20-30 Million','30-40 Million','40-50 Million',
'50-70 Million', '70-80 Million','90-140 Million'],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Opening Weekend: (millions)|Amount of Movies: 29',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#960018 ",
data: [
{
name: "9-20 Million",
y: 10,
drilldown: "9-20 Million"
},
{
name: "20-30 Million",
y: 28,
drilldown: "20-30 Million"
},
{
name: "30-40 Million",
y: 7,
drilldown: "30-40 Million"
},
{
name: "40-50 Million",
y: 7,
drilldown: "40-50 Million"
},
{
name: "50-70 Million",
y: 21,
drilldown: "50-70 Million"
},
{
name: "70-80 Million",
y: 7,
drilldown: "70-80 Million"
},
{
name: "90-140 Million",
y: 21,
drilldown: "90-140 Million"
},
]
}
]
});
%%js
Highcharts.chart('container996', {
chart: {
type: 'bar',
width: 400,
height: 400
},
title: {
text: 'Autumn: The opening weekend of movies that were released in the autumn'
},
subtitle: {
text: 'Group one: The Budget of each movie that are between 50 to 80 million'
},
xAxis: {
categories: [ '8-12 Million','20-30 Million','30-40 Million','40-50 Million'
],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Opening Weekend: (millions)|Amount of Movies: 16',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#960018 ",
data: [
{
name: "8-12 Million",
y: 19,
drilldown: "8-12 Million"
},
{
name: "20-30 Million",
y: 35,
drilldown: "20-30 Million"
},
{
name: "30-40 Million",
y: 19,
drilldown: "30-40 Million"
},
{
name: "40-50 Million",
y: 6,
drilldown: "40-50 Million"
},
]
}
]
});
%%js
Highcharts.chart('container997', {
chart: {
type: 'bar',
width: 400,
height: 400
},
title:{text:''},
subtitle: {
text: 'Group two: The Budget of each movie that are between 80 to 300 million'
},
xAxis: {
categories: ['9-20 Million','20-30 Million','30-40 Million','40-50 Million',
'50-70 Million', '70-80 Million','90-140 Million'],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Opening Weekend: (millions)|Amount of Movies: 14',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#960018 ",
data: [
{
name: "3-13 Million",
y: 21,
drilldown: "3-13 Million"
},
{
name: "20-35 Million",
y: 21,
drilldown: "20-35 Million"
},
{
name: "50-60 Million",
y: 14,
drilldown: "50-60 Million"
},
{
name: "70-90 Million",
y: 21,
drilldown: "70-90 Million"
},
{
name: "90-130 Million",
y: 21,
drilldown: "90-130 Million"
},
]
}
]
});
%%js
function dollarFormat(x) {
return '$' + Highcharts.numberFormat(x, 0, '.', ',');
}
var colors = Highcharts.getOptions().colors;
Highcharts.chart('container998', {
chart: {
type: 'column',
inverted: false,
height: 500
},
accessibility: {
series: {
descriptionFormatter: function (series) {
return series.type === 'line' ?
series.name + ', ' + dollarFormat(series.points[0].y) :
series.name + ' grant amounts, bar series with ' +
series.points.length + ' bars.';
}
},
point: {
valuePrefix: '$'
},
keyboardNavigation: {
seriesNavigation: {
mode: 'serialize'
}
}
},
title: {
text: 'The total profit of movies in the Drama genre that were released in each season',
margin: 35
},
subtitle: {
text: 'There are four seasons in a year: Winter(December, Janurray, Feburary), Spring(March, April, May), Summer(June, July, August), Autumn(September, October, Novemeber)'
},
xAxis: {
visible: false,
accessibility: {
description: 'Grant applicants',
rangeDescription: ''
}
},
yAxis: [{
min: 0,
max: 2000000000,
labels: {
format: '${text}'
},
title: {
text: 'Grant amount'
},
gridLineWidth: 1
}, {
accessibility: {
description: 'Indivisual Movie total'
},
opposite: true,
min: 0,
max: 14000000000,
gridLineWidth: 0,
labels: {
format: '${text}',
style: {
color: '#8F6666'
}
},
title: {
text: 'Season total',
style: {
color: '#8F6666'
}
}
}],
credits: {
enabled: false
},
plotOptions: {
column: {
keys: ['name', 'y'],
grouping: false,
pointPadding: 0.1,
groupPadding: 0,
tooltip: {
headerFormat: '<span style="font-size: 10px">' +
'<span style="color:{point.color}">\u25CF</span> ' +
'{series.name}</span><br/>',
pointFormat: '{point.name}: <b>${point.y:,.0f}</b><br/>'
}
},
line: {
yAxis: 1,
lineWidth: 5,
accessibility: {
exposeAsGroupOnly: true
},
marker: {
enabled: false
},
enableMouseTracking: false,
linkedTo: ':previous',
dataLabels: {
enabled: true,
verticalAlign: 'bottom',
style: {
color: '#757575',
fontWeight: 'normal'
},
formatter: function () {
if (this.point === this.series.points[Math.floor(
this.series.points.length / 2
)]) {
return 'Total: $' + Highcharts.numberFormat(this.y, 0);
}
}
}
}
},
responsive: {
rules: [{
condition: {
maxWidth: 400
},
chartOptions: {
chart: {
spacingLeft: 3,
spacingRight: 5
},
yAxis: [{}, {
visible: false
}]
}
}]
},
series: [{
name: ' Winter',
color: "#ff9999",
borderColor: '#A59273',
borderWidth: 1,
data: [
[ ' The Adventures of Tintin ' , 243993951 ],
[ ' Spider-Man: Into The Spider-Verse 3D ' , 285381768 ],
[ ' Alvin and the Chipmunks: The Road Chip ' , 159517956 ],
[ ' Star Wars Ep. VII: The Force Awakens ' , 1747311220 ],
[ ' Fool\'s Gold ' , 36862966 ],
[ ' Alvin and the Chipmunks: The Squeakquel ' , 373483213 ],
[ ' Hook ' , 230854823 ],
[ ' Rumor Has It ' , 18933562 ],
[ ' Hall Pass ' , 19173475 ],
[ ' Titanic ' , 2008208395 ],
[ ' Aquaman ' , 986894640 ],
[ ' Edge of Darkness ' , 22812456 ],
[ ' It\'s Complicated ' , 139614744 ],
[ ' Alvin and the Chipmunks: Chipwrecked ' , 269088523 ],
[ ' The Tale of Despereaux ' , 30482317 ],
[ ' Seven Pounds ' , 112617328 ],
[ ' Stepmom ' , 109745279 ],
[ ' Sherlock Holmes ' , 408438212 ],
[ ' Escape Plan ' , 33735965 ],
[ ' Les Miserables ' , 377169052 ],
[ ' Unbroken ' , 98527824 ],
[ ' Broken Arrow ' , 83345997 ],
[ ' The Hateful Eight ' , 85864886 ],
[ ' Kangaroo Jack ' , 30723216 ],
[ ' Star Wars Ep. VIII: The Last Jedi ' , 999721747 ],
[ ' King Kong ' , 343517357 ],
[ ' Mission: ImpossibleâGhost Protocol ' , 549713230 ],
[ ' Happy Feet Two ' , 22956466 ],
[ ' Australia ' , 85080810 ],
[ ' Blood Diamond ' , 71377916 ],
[ ' The Girl with the Dragon Tattoo ' , 149373970 ],
[ ' Valkyrie ' , 113932174 ],
[ ' Ocean\'s Eleven ' , 365728529 ],
]
}, {
type: 'line',
name: ' Winter',
data: [
10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,
10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,
10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,
10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,10614183967,
10614183967
],
color: "#ff9999"
}, {
name: ' Spring',
color: "#ff1919",
data: [
[ ' Pirates of the Caribbean: On Stranger Tides ' , 635063875 ],
[ ' Avengers: Age of Ultron ' , 1072413963 ],
[ ' Pirates of the Caribbean: At World\'s End ' , 663420425 ],
[ ' Solo: A Star Wars Story ' , 118151347 ],
[ ' Pirates of the Caribbean: Dead Men Tell No Tales ' , 558241137 ],
[ ' Indiana Jones and the Kingdom of the Crystal Skull ' , 601635413 ],
[ ' Shrek the Third ' , 647330936 ],
[ ' Dark Shadows ' , 88202668 ],
[ ' The Croods ' , 438068425 ],
[ ' Logan ' , 488461394 ],
[ ' Gladiator ' , 354683805 ],
[ ' Wonder Park ' , 15149422 ],
[ ' Die Hard: With a Vengeance ' , 276101666 ],
[ ' Tomb Raider ' , 183477501 ],
[ ' Divergent ' , 191014965 ],
[ ' Tomorrowland ' , 36627518 ],
[ ' Kung Fu Panda 3 ' , 377599142 ],
[ ' The Day After Tomorrow ' , 431319450 ],
[ ' Power Rangers ' , 22531552 ],
[ ' Kingdom of Heaven ' , 108853353 ],
[ ' The Sum of All Fears ' , 125500000 ],
[ ' The Dictator ' , 115148897 ],
[ ' Rambo III ' , 130715611 ],
[ ' The Adjustment Bureau ' , 76731325 ],
[ ' Inside Man ' , 135798265 ],
[ ' Fever Pitch ' , 10071069 ],
[ ' Spider-Man 3 ' , 636860230 ],
[ ' Thor ' , 299326618 ],
[ ' Rango ' , 110724600 ],
[ ' The Mummy Returns ' , 337040395 ],
[ ' Need for Speed ' , 128169619 ],
[ ' The Matrix ' , 398517383 ],
[ ' 300 ' , 394161935 ],
[ ' Wild Hogs ' , 193555383 ],
[ ' London Has Fallen ' , 135194085 ],
[ ' Hellboy ' , 39823958 ],
[ ' Jack the Giant Slayer ' , 2687603 ],
[ ' Furious 7 ' , 1328722794 ],
[ ' Star Trek Into Darkness ' , 277381584 ],
[ ' Monsters vs. Aliens ' , 206687380 ],
[ ' Poseidon ' , 21674817 ],
[ ' Fast Five ' , 505163454 ],
[ ' Godzilla ' , 251000000 ],
[ ' Epic ' , 162794441 ],
[ ' Volcano ' , 30100000 ],
],
pointStart: 36
}, {
type: 'line',
name: ' Spring',
data: [
13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,
13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,
13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,
13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,
13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,13361899403,
13361899403,13361899403,13361899403,13361899403,13361899403
],
pointStart: 36,
color:"#ff1919"
}, {
name: ' Summer',
color: "#990000",
data: [
[ ' Pirates of the Caribbean: Dead Man\'s Chest ' , 841215812 ],
[ ' Pacific Rim ' , 221002906 ],
[ ' Spider-Man: Homecoming ' , 705166350 ],
[ ' Transformers ' , 557272592 ],
[ ' The Last Airbender ' , 169713881 ],
[ ' Skyscraper ' , 179115534 ],
[ ' Public Enemies ' , 109782709 ],
[ ' Rush Hour 2 ' , 257425832 ],
[ ' Percy Jackson: Sea of Monsters ' , 110859554 ],
[ ' The Bourne Supremacy ' , 226001124 ],
[ ' Fast & Furious ' , 278064265 ],
[ ' Man of Steel ' , 442999518 ],
[ ' Mission: Impossible Rogue Nation ' , 538858992 ],
[ ' Atlantis: The Lost Empire ' , 96049020 ],
[ ' Who Framed Roger Rabbit? ' , 281500000 ],
[ ' Ocean\'s 8 ' , 227115976 ],
[ ' AVP: Alien Vs. Predator ' , 102543519 ],
[ ' Total Recall ' , 196400000 ],
[ ' Step Brothers ' , 63468793 ],
[ ' Pete\'s Dragon ' , 72768975 ],
[ ' Salt ' , 160650494 ],
[ ' Die Hard 2 ' , 169814025 ],
[ ' World Trade Center ' , 98295654 ],
[ ' The Dark Tower ' , 53461527 ],
[ ' Planes: Fire and Rescue ' , 106399644 ],
[ ' Bedazzled ' , 42376224 ],
[ ' Cleopatra ' , 29000000 ],
[ ' Legal Eagles ' , 9851591 ],
[ ' The Skeleton Key ' , 52256918 ],
[ ' The Mummy: Tomb of the Dragon Emperor ' , 230760225 ],
[ ' The Sorcerer\'s Apprentice ' , 57986320 ],
[ ' Harry Potter and the Order of the Phoenix ' , 793076457 ],
[ ' The Bourne Ultimatum ' , 314043396 ],
[ ' Prometheus ' , 277448265 ],
[ ' RoboCop ' , 122981799 ],
[ ' The Smurfs ' , 453749323 ],
[ ' Seabiscuit ' , 62715342 ],
[ ' Abraham Lincoln: Vampire Hunter ' , 69989730 ],
[ ' Space Cowboys ' , 63874043 ],
[ ' Death Race ' , 7516819 ],
[ ' 2 Guns ' , 71493015 ],
[ ' War for the Planet of the Apes ' , 337592267 ],
[ ' Charlie and the Chocolate Factory ' , 325825484 ],
[ ' Ghostbusters ' , 85008658 ],
[ ' Harry Potter and the Sorcerer\'s Stone ' , 850047606 ],
[ ' The Wolverine ' , 301456852 ],
[ ' The Patriot ' , 105300000 ],
[ ' True Lies ' , 265300000 ],
[ ' Point Break ' , 26704591 ],
[ ' Artificial Intelligence: AI ' , 145900000 ],
[ ' Fantastic Four ' , 245632750 ]
],
pointStart: 83
}, {
type: 'line',
name: ' Summer',
data: [
11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,
11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,
11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,
11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,
11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,
11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,11613834371,
11613834371,11613834371,11613834371
],
pointStart: 83,
color: "#990000"
}, {
name: ' Autumn',
color: "#4d0000",
data: [
[ ' Justice League ' , 355945209 ],
[ ' Toy Story 2 ' , 421358276 ],
[ ' Thor: Ragnarok ' , 666980024 ],
[ ' Moana ' , 487517365 ],
[ ' The World is Not Enough ' , 226730660 ],
[ ' The Twilight Saga: Breaking Dawn, Part 1 ' , 561920051 ],
[ ' Ender\'s Game ' , 17983283 ],
[ ' The Departed ' , 199660619 ],
[ ' The Kingdom ' , 14009602 ],
[ ' Sleepy Hollow ' , 137068340 ],
[ ' The Boxtrolls ' , 51946251 ],
[ ' First Man ' , 45203825 ],
[ ' Shark Tale ' , 296917043 ],
[ ' Daddy\'s Home 2 ' , 105807183 ],
[ ' Murder on the Orient Express ' , 290922730 ],
[ ' The Peacemaker ' , 12967368 ],
[ ' The One ' , 23689126 ],
[ ' The Intern ' , 157115710 ],
[ ' Allied ' , 13266661 ],
[ ' I, Frankenstein ' , 9575290 ],
[ ' Money Train ' , 9224232 ],
[ ' Everest ' , 156297061 ],
[ ' Gone Girl ' , 307567189 ],
[ ' Jack Reacher: Never Go Back ' , 99946489 ],
[ ' The Jackal ' , 99356941 ],
[ ' Hugo ' , 47784 ],
[ ' Doctor Strange ' , 511404566 ],
[ ' Big Hero 6 ' , 487127828 ],
[ ' Harry Potter and the Goblet of Fire ' , 747099794 ],
[ ' Bolt ' , 178015029 ],
],
pointStart: 137
}, {
type: 'line',
name: 'Autumn',
data: [
6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,
6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,
6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,6692671529,
6692671529,6692671529,6692671529,6692671529,6692671529,6692671529
],
pointStart: 137,
color: "#4d0000"
}]
});
%%js
Highcharts.chart('container101', {
chart: {
type: 'bar',
width: 400,
height: 400
},
title: {
text: 'Winter'
},
subtitle: {
text: 'The profit of the movies thata were released in the winter'
},
xAxis: {
categories: [ '20-50 Million','50-100 Million','100-200 Million','200-300 Million',
'300-400 Million','400-550 Million','900 Million-2.1 Billion'
],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Profit: (millions-Billions)|Amount of Movies: 33',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#ff9999",
data: [
{
name: "20-50 Million",
y: 24,
drilldown: "20-50 Million"
},
{
name: "50-100 Million",
y: 15,
drilldown: "50-100 Million"
},
{
name: "100-200 Million",
y: 18,
drilldown: "100-200 Million"
},
{
name: "200-300 Million",
y: 12,
drilldown: "200-300 Million"
},
{
name: "300-400 Million",
y: 12,
drilldown: "300-400 Million"
},
{
name: "400-550 Million",
y: 6,
drilldown: "400-550 Million"
},
{
name: "900 Million-2.1 Billion",
y: 12,
drilldown: "900 Million-2.1 Billion"
},
]
}
]
});
%%js
Highcharts.chart('container102', {
chart: {
type: 'bar',
width: 400,
height: 400
},
title: {
text: 'Spring'
},
subtitle: {
text: 'The profit of the movies thata were released in the Spring'
},
xAxis: {
categories: [ '3-40 Million','50-100 Million','100-200 Million','200-300 Million',
'300-400 Million','400-500 Million','500-670 Million','1-1.3 Billion'
],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Profit: (millions-Billions)|Amount of Movies: 45',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#ff1919",
data: [
{
name: "3-40 Million",
y: 18,
drilldown: "3-40 Million"
},
{
name: "50-100 Million",
y: 4,
drilldown: "50-100 Million"
},
{
name: "100-200 Million",
y: 29,
drilldown: "100-200 Million"
},
{
name: "200-300 Million",
y: 11,
drilldown: "200-300 Million"
},
{
name: "300-400 Million",
y: 11,
drilldown: "300-400 Million"
},
{
name: "400-500 Million",
y: 6,
drilldown: "400-500 Million"
},
{
name: "500-670 Million",
y: 15,
drilldown: "500-670 Million"
},
{
name: "1-1.3 Billion",
y: 4,
drilldown: "1-1.3 Billion"
},
]
}
]
});
%%js
Highcharts.chart('container103', {
chart: {
type: 'bar',
width: 400,
height: 400
},
title: {
text: 'Summer'
},
subtitle: {
text: 'The profit of the movies thata were released in the Summer'
},
xAxis: {
categories: [ '8-50 Million','50-100 Million','100-200 Million','200-300 Million',
'300-400 Million','400-500 Million','500-600 Million','700-800 Million','800-850 Million'
],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Profit: (millions)|Amount of Movies: 51',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#990000",
data: [
{
name: "8-50 Million",
y: 10,
drilldown: "8-50 Million"
},
{
name: "50-100 Million",
y: 24,
drilldown: "50-100 Million"
},
{
name: "100-200 Million",
y: 24,
drilldown: "100-200 Million"
},
{
name: "200-300 Million",
y: 20,
drilldown: "200-300 Million"
},
{
name: "300-400 Million",
y: 8,
drilldown: "300-400 Million"
},
{
name: "400-500 Million",
y: 4,
drilldown: "400-500 Million"
},
{
name: "500-600 Million",
y: 4,
drilldown: "500-600 Million"
},
{
name: "700-800 Million",
y: 4,
drilldown: "700-800 Million"
},
{
name: "800-850 Million",
y: 4,
drilldown: "800-850Million"
},
]
}
]
});
%%js
Highcharts.chart('container104', {
chart: {
type: 'bar',
width: 400,
height: 400
},
title: {
text: 'Autumn'
},
subtitle: {
text: 'The profit of the movies thata were released in the Autumn'
},
xAxis: {
categories: [ '50 Thousand-50 Million','50-100 Million','100-200 Million','200-300 Million',
'300-400 Million','400-500 Million','500-600 Million','700-850 Million',
],
title: {
text: null
}
},
yAxis: {
min: 0,
title: {
text: 'Profit: (millions)|Amount of Movies: 30',
align: 'high'
},
labels: {
overflow: 'justify'
}
},
legend: {
enabled: false
},
tooltip: {
valueSuffix: '%'
},
plotOptions: {
bar: {
dataLabels: {
enabled: true
}
},
series: {
borderWidth: 0,
dataLabels: {
enabled: true,
format: '{point.y:.1f}%'
}
}
},
credits: {
enabled: false
},
series: [
{
name: "Probability",
color: "#4d0000",
data: [
{
name: "50 Thousand-50 Million",
y: 30,
drilldown: "50 Thousand -50 Million"
},
{
name: "50-100 Million",
y: 10,
drilldown: "50-100 Million"
},
{
name: "100-200 Million",
y: 20,
drilldown: "100-200 Million"
},
{
name: "200-300 Million",
y: 10,
drilldown: "200-300 Million"
},
{
name: "300-400 Million",
y: 13,
drilldown: "300-400 Million"
},
{
name: "400-500 Million",
y: 7,
drilldown: "400-500 Million"
},
{
name: "500-600 Million",
y: 7,
drilldown: "500-600 Million"
},
{
name: "700-850 Million",
y: 13,
drilldown: "700-850 Million"
},
]
}
]
});
This is the blueprint for creating the fourth visualization Revenue of Movies, Altair will be used to create this graph.
Blueprint:
The format of the dataframe needed for this graph is the same as the previous datarame that was created.
The style of this graph is the Dot Dash Plot which is found in Altairs Gallery. It is a scatter plot with a x-axis and a y-axis. The x-axis shows the revenue of each movie and the y-axis is not visible. The Dot Dash plots are scatter plot with trick marks protraying the minute seperation of the amount of items in each category within the selection. To create a selection create a box by dragging the mouse. When the mouse hovers over the pionts it projects the name, system rating and the revenue of the movie.
This is the blueprint for creating the fivth visualization Budget of Movies, Altair will be used to create this graph.
Blueprint:
The format of the dataframe needed for this graph is the same as the previous datarame that was created.
The style of this graph is the Dot Dash Plot which is found in Altairs Gallery. It is a scatter plot with a x-axis and a y-axis. The y-axis shows the Budget of each movie and the x-axis is not visible. The Dot Dash plots are scatter plot with trick marks protraying the minute seperation of the amount of items in each category within the selection. To create a selection create a box by dragging the mouse. When the mouse hovers over the pionts it projects the name, system rating and the budget of the movie.
This is the 'Drama_DataFrame' dataframe.
Drama_DataFrame
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | Worldwide_Gross | Worldwide_Gross_x | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Hugo | Nov 23, 2011 | Drama | PG | 180000000.0 | $180,000,000 | 73864507 | $73,864,507 | 111900000.0 | $111,900,000 | 180047784 | $180,047,784 | 47784.0 | $47,784 | 18004778 | 18,004,778 | 126.0 | 7.5 | Paramount Pictures | Asa Butterfield | Martin Scorsese | John Logan |
| 1 | The Wolfman | Feb 12, 2010 | Drama | R | 150000000.0 | $150,000,000 | 62189884 | $62,189,884 | 77800000.0 | $77,800,000 | 142634358 | $142,634,358 | -7365642.0 | $-7,365,642 | 14263436 | 14,263,436 | NaN | 5.8 | NaN | Benicio Del Toro | Joe Johnston | Andrew Kevin Walker |
| 2 | Gravity | Oct 4, 2013 | Drama | PG-13 | 110000000.0 | $110,000,000 | 274092705 | $274,092,705 | 449100000.0 | $449,100,000 | 693698673 | $693,698,673 | 583698673.0 | $583,698,673 | 69369867 | 69,369,867 | 91.0 | 7.7 | Warner Bros. | Sandra Bullock | Alfonso Cuarón | Alfonso Cuarón |
| 3 | Django Unchained | Dec 25, 2012 | Drama | R | 100000000.0 | $100,000,000 | 162805434 | $162,805,434 | 262600000.0 | $262,600,000 | 449948323 | $449,948,323 | 349948323.0 | $349,948,323 | 44994832 | 44,994,832 | 165.0 | 8.4 | The Weinstein Company | Jamie Foxx | Quentin Tarantino | Quentin Tarantino |
| 4 | Sing | Dec 21, 2016 | Drama | PG-13 | 75000000.0 | $75,000,000 | 270329045 | $270,329,045 | 363800000.0 | $363,800,000 | 634454789 | $634,454,789 | 559454789.0 | $559,454,789 | 63445479 | 63,445,479 | 98.0 | 7.1 | TriStar Pictures | Lorraine Bracco | Richard Baskin | Dean Pitchford |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 301 | A Dirty Shame | September 24, 2004 | Drama | NC-17 | 15000000.0 | $15,000,000 | 1339668 | $1,339,668 | 574498.0 | $574,498 | 1914166 | $1,914,166 | -13085834.0 | $-13,085,834 | 191417 | 191,417 | 84.0 | 5.1 | Killer Films | Suzanne Shepherd | John Waters | John Waters |
| 302 | Young Adam | April 16, 2004 | Drama | NC-17 | 6400000.0 | $6,400,000 | 767373 | $767,373 | 1794447.0 | $1,794,447 | 2561820 | $2,561,820 | -3838180.0 | $-3,838,180 | 256182 | 256,182 | 98.0 | 6.4 | Recorded Picture Company | Tilda Swinton | David Mackenzie | \tDavid Mackenzie |
| 303 | Whore 1991 | October 4, 1991 | Drama | NC-17 | 50000.0 | $50,000 | 0 | $0 | 0.0 | $0 | 1008404 | $1,008,404 | 958404.0 | $958,404 | 100840 | 100,840 | 80.0 | 5.5 | Cheap Date | Theresa Russell | Ken Russell | Deborah Dalton |
| 304 | Ma Mère | May 13, 2005 | Drama | NC-17 | 3259572.0 | $3,259,572 | 71616 | $71,616 | 950532.0 | $950,532 | 1022148 | $1,022,148 | -2237424.0 | $-2,237,424 | 102215 | 102,215 | 110.0 | 5.0 | Gemini Films | Louis Garrel | Christophe Honoré | Christophe Honoré |
| 305 | Law of Desire | April 3, 1987 | Drama | NC-17 | 612072.0 | $612,072 | 0 | $0 | 0.0 | $0 | 1470809 | $1,470,809 | 858737.0 | $858,737 | 147081 | 147,081 | 82.0 | 7.1 | El Deseo | Antonio Banderas | Pedro Almodóvar | Pedro Almodóvar |
306 rows × 22 columns
Getting the Budget of all the 'R-rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.
budget = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='R':budget.append(Drama_DataFrame.Production_Budget[i])
print(budget)
[150000000.0, 100000000.0, 68000000.0, 61000000.0, 60000000.0, 55000000.0, 55000000.0, 55000000.0, 52500000.0, 40000000.0, 37500000.0, 35000000.0, 31000000.0, 25000000.0, 23000000.0, 22500000.0, 22500000.0, 22000000.0, 21000000.0, 20000000.0, 20000000.0, 20000000.0, 18000000.0, 16000000.0, 16000000.0, 15000000.0, 15000000.0, 13000000.0, 13000000.0, 13000000.0, 13000000.0, 12000000.0, 12000000.0, 12000000.0, 11800000.0, 11000000.0, 10000000.0, 10000000.0, 9400000.0, 8500000.0, 7000000.0, 7000000.0, 5000000.0, 5000000.0, 4900000.0, 4750000.0, 4000000.0, 4000000.0, 3500000.0, 3400000.0, 3300000.0, 3000000.0, 2000000.0, 2000000.0, 2000000.0, 2000000.0, 2000000.0, 2000000.0, 1987650.0, 1500000.0, 1000000.0, 1000000.0, 1000000.0, 1000000.0, 250000.0, 135000.0, 100000.0, 6000000.0, 8500000.0, 20000000.0, 100000.0, 26000000.0, 6500000.0, 22000000.0, 2700000.0, 11500000.0, 9000000.0]
Getting the Budget of all the 'PG-rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.
budget1 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='PG':budget1.append(Drama_DataFrame.Production_Budget[i])
print(budget1)
[180000000.0, 37000000.0, 31000000.0, 20000000.0, 20000000.0, 3000000.0, 1700000.0, 5100000.0, 10000000.0, 95000000.0, 3000000.0, 20000000.0, 40000000.0, 5000000.0, 422000.0, 5100000.0, 72000000.0, 11800000.0, 15000000.0, 32000000.0, 40000000.0, 65000000.0, 8000000.0, 9000000.0, 17000000.0, 30000000.0, 500000.0, 20000000.0, 11000000.0, 2000000.0, 23000000.0, 45000000.0, 15000000.0, 10000000.0, 32000000.0, 90000000.0, 10000000.0, 27000000.0, 16000000.0, 3000000.0, 15000000.0, 25000000.0, 34000000.0, 10000000.0, 20000000.0, 15000000.0, 12000000.0, 5000000.0, 7000000.0, 14000000.0, 15000000.0, 12000000.0, 28300000.0, 8000000.0, 7500000.0, 17000000.0, 5000000.0, 9000000.0, 15000000.0, 22000000.0, 5000000.0, 4500000.0, 4500000.0, 8000000.0, 16000000.0, 8200000.0, 28000000.0]
Getting the Budget of all the 'G-rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.
budget2 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='G':budget2.append(Drama_DataFrame.Production_Budget[i])
print(budget2)
[35446775.0, 700000.0, 8600000.0, 7000000.0, 18000000.0, 4400000.0, 17000000.0, 22000000.0, 20000000.0, 23000000.0, 15000000.0, 2700000.0, 70000000.0, 30000000.0, 2500000.0, 90000000.0, 666000.0, 85000000.0, 17000000.0, 10000000.0, 22000000.0, 18000000.0, 8200000.0, 60000000.0, 45000000.0, 858000.0, 17000000.0, 300000.0, 10000000.0, 6400000.0, 13000000.0, 1750000.0, 1700000.0, 3000000.0]
Getting the Budget of all the 'PG-13 rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.
budget3 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='PG-13':budget3.append(Drama_DataFrame.Production_Budget[i])
print(budget3)
[110000000.0, 75000000.0, 60000000.0, 60000000.0, 55000000.0, 50000000.0, 50000000.0, 50000000.0, 50000000.0, 50000000.0, 49000000.0, 47000000.0, 44000000.0, 40000000.0, 40000000.0, 40000000.0, 40000000.0, 38000000.0, 37000000.0, 37000000.0, 36000000.0, 35000000.0, 35000000.0, 34000000.0, 33000000.0, 30000000.0, 30000000.0, 30000000.0, 28000000.0, 27500000.0, 26000000.0, 25000000.0, 25000000.0, 25000000.0, 25000000.0, 25000000.0, 25000000.0, 24000000.0, 21000000.0, 20000000.0, 20000000.0, 19000000.0, 18000000.0, 18000000.0, 17000000.0, 17000000.0, 16000000.0, 16000000.0, 15000000.0, 15000000.0, 15000000.0, 14000000.0, 13000000.0, 12000000.0, 12000000.0, 11000000.0, 11000000.0, 10000000.0, 10000000.0, 9700000.0, 9000000.0, 9000000.0, 7400000.0, 7000000.0, 6000000.0, 5000000.0, 5000000.0, 5000000.0, 5000000.0, 4500000.0, 4357373.0, 2600000.0, 2000000.0, 1400000.0, 250000.0, 175000.0]
Getting the Budget of all the 'NC-17 rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.
budget4 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='NC-17':budget4.append(Drama_DataFrame.Production_Budget[i])
print(budget4)
[6500000.0, 12500000.0, 1000000.0, 20000.0, 955472.0, 1500000.0, 45000000.0, 9000000.0, 5000000.0, 15000000.0, 2734384.0, 15000000.0, 6500000.0, 4000000.0, 45000000.0, 15000000.0, 6500000.0, 4074940.0, 1000000.0, 1000000.0, 3565572.0, 12000000.0, 10000000.0, 15000000.0, 19000000.0, 350000.0, 1000000.0, 6500000.0, 4700000.0, 904765.0, 3000000.0, 700000.0, 34000000.0, 230000.0, 1000000.0, 3200000.0, 1000000.0, 1500000.0, 6500000.0, 1250000.0, 12000.0, 15000000.0, 2200000.0, 1300000.0, 15000000.0, 6400000.0, 50000.0, 3259572.0, 612072.0]
Getting the Star Ratings of all the 'R-rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.
rating = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='R':rating.append(Drama_DataFrame.Averagerating[i])
print(rating)
[5.8, 8.4, 5.7, 8.1, 5.7, 4.6, 4.5, 6.5, 7.4, 4.1, 7.1, 7.5, 7.3, 6.2, 7.1, 7.5, 7.1, 5.6, 6.1, 6.9, 7.1, 5.3, 7.5, 6.6, 6.6, 7.1, 6.9, 8.0, 7.7, 8.2, 6.9, 7.2, 6.6, 6.8, 7.2, 6.8, 7.3, 8.7, 7.2, 7.8, 7.5, 7.0, 5.2, 6.4, 8.1, 7.4, 7.9, 6.5, 6.8, 7.1, 8.5, 7.9, 5.3, 7.2, 7.6, 6.2, 7.1, 4.9, 7.0, 6.4, 7.4, 6.9, 6.2, 7.4, 3.8, 6.6, 6.8, 7.7, 6.6, 4.9, 6.3, 6.5, 5.5, 6.5, 6.8, 5.9, 6.8]
Getting the Star Ratings of all the 'PG-rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.
rating1 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='PG':rating1.append(Drama_DataFrame.Averagerating[i])
print(rating1)
[7.5, 6.9, 6.5, 8.0, 6.0, 6.5, 7.8, 7.2, 6.4, 6.9, 6.5, 8.0, 7.8, 6.6, 5.9, 6.1, 6.9, 7.3, 6.6, 6.8, 6.8, 7.1, 7.4, 7.3, 7.1, 7.5, 6.5, 6.0, 6.4, 4.7, 7.3, 6.0, 6.7, 6.1, 6.4, 7.5, 7.2, 6.8, 7.7, 7.5, 7.8, 7.6, 7.2, 7.0, 6.3, 6.9, 7.2, 6.3, 7.3, 6.8, 7.6, 6.9, 7.3, 6.1, 6.0, 6.8, 6.5, 5.7, 6.1, 4.7, 6.9, 7.4, 7.0, 6.1, 6.1, 6.6, 7.5]
Getting the Star Ratings of all the 'G-rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.
rating2 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='G':rating2.append(Drama_DataFrame.Averagerating[i])
print(rating2)
[7.2, 7.6, 7.3, 6.4, 7.3, 7.8, 7.7, 6.9, 8.0, 6.3, 6.5, 7.4, 7.0, 9.6, 9.0, 5.8, 7.1, 6.3, 7.6, 6.5, 6.9, 7.3, 8.1, 6.1, 8.5, 7.3, 7.8, 6.6, 8.1, 7.6, 7.9, 7.7, 6.3, 7.1]
Getting the Star Ratings of all the 'PG-13 rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.
rating3 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='PG-13':rating3.append(Drama_DataFrame.Averagerating[i])
print(rating3)
[7.7, 7.1, 6.6, 6.8, 6.4, 7.2, 7.2, 6.5, 6.0, 6.6, 6.6, 7.9, 6.5, 7.6, 7.6, 5.7, 6.0, 6.9, 5.8, 6.0, 6.8, 7.6, 6.8, 7.1, 6.5, 6.8, 7.2, 6.4, 6.7, 6.9, 6.7, 8.1, 6.3, 6.5, 6.5, 6.8, 4.5, 7.2, 6.7, 7.4, 7.2, 7.6, 6.9, 6.6, 6.6, 5.6, 4.9, 7.1, 6.4, 6.3, 7.0, 6.9, 8.0, 6.4, 5.0, 6.8, 7.5, 6.4, 7.4, 7.9, 6.1, 6.6, 4.3, 5.6, 7.1, 6.4, 7.5, 6.4, 7.0, 6.4, 6.5, 7.4, 7.0, 7.6, 6.7, 7.0]
Getting the Star Ratings of all the 'NC-17 rated' movies in the Drama genre from the 'Drama_DataFrame' dataframe.
rating4 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x=='NC-17':rating4.append(Drama_DataFrame.Averagerating[i])
print(rating4)
[7.2, 7.0, 5.6, 6.0, 5.7, 7.1, 4.9, 6.4, 7.2, 7.2, 5.1, 7.5, 7.2, 7.7, 4.9, 7.1, 7.2, 7.7, 7.4, 5.5, 5.6, 5.9, 6.7, 7.5, 7.1, 7.4, 7.4, 7.2, 6.9, 6.7, 6.2, 6.4, 7.2, 7.7, 7.0, 6.2, 6.1, 7.0, 7.8, 6.9, 6.0, 7.5, 7.7, 6.1, 5.1, 6.4, 5.5, 5.0, 7.1]
This is a function called 'Average' that gets the average of a list.
def Average(l):
avg = sum(l) / len(l)
return avg
Getting the average of all the 'Star Ratings' of each movie in the Drama Genre from the 'Drama_DataFrame' dataframe.
for i in [rating,rating1,rating2,rating3,rating4]:print(Average(i))
6.744155844155842 6.8044776119403 7.311764705882353 6.7144736842105255 6.630612244897959
Getting the average of all the 'Budget' of each movie in the Drama Genre from the 'Drama_DataFrame' dataframe.
for i in [budget,budget1,budget2,budget3,budget4]:print(Average(i))
18335359.09090909 21651074.62686567 20182963.970588237 25577399.64473684 7479975.040816327
Getting the average of all the 'Budgets' of each movie in the Drama Genre from the 'Drama_DataFrame' dataframe.
for i in [world_int,world_int1,world_int2,world_int3,world_int4]:print(Average(i))
74913332.64285715 103158408.56521739 146873371.52 104420678.015625 28266049.647058822
This is the HTML Script from Highcharts Libaray to visualize the data of the Average Budegt, Revenue and Star Ratings of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Radial Bar Chart'. This will be done using Javascript and HTML below and will be saved in an .png file.
%%html
<script src="https://code.highcharts.com/highcharts.js"></script>
<script src="https://code.highcharts.com/highcharts-more.js"></script>
<script src="https://code.highcharts.com/modules/exporting.js"></script>
<script src="https://code.highcharts.com/modules/export-data.js"></script>
<script src="https://code.highcharts.com/modules/accessibility.js"></script>
<table><tr><th></th><th></th><th></th><th></th></tr><tr><th></th><th></th><th></th><th></th></tr></th><th></th></tr>
<tr>
<td><span class="gridMap" id="container6"></span><td>
<td><span class="gridMap" id="container7"></span><td>
<td><span class="gridMap" id="container8"></span><td>
</tr>
</table>
This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Average Budget' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Radial Bar Chart'. A 'Radial Bar Chart' is similar to a bar chart, but the y-axis is circular. This will be done using Javascript and HTML and will be saved in a .png file called 'average-budget-of-all-sy.png'.
%%js
Highcharts.chart('container6', {
colors: ['#ff5500', '#D00000', '#800000', '#A00000', 'red'],
chart: {
type: 'column',
height:550,
width:500,
inverted: true,
polar: true
},
tooltip: {
shared: true,
useHTML: true,
},
legend: true,
title: {
text: 'Average Budget of All System Rating'
},
tooltip: {
outside: true
},
pane: {
size: '85%',
innerSize: '20%',
endAngle: 270
},
xAxis: {
tickInterval: 1,
legend: true,
labels: {
align: 'right',
useHTML: true,
allowOverlap: true,
step: 1,
y: 3,
style: {
fontSize: '13px'
}
},
lineWidth: 0,
categories: [
'R',
'PG',
'G',
'PG-13',
'NC-17'
]
},
yAxis: {
tickPositions:[0,2500000,5000000,7500000,10000000,12500000,15000000,17500000,20000000,22500000,25000000,27500000,30000000],
labels: {
formatter: function() {
return this.value / 1000000 + 'M';
}},
crosshair: {
enabled: true,
color: '#333'
},
lineWidth: 0,
tickInterval: 25,
reversedStacks: false,
endOnTick: true,
showLastLabel: true
},
plotOptions: {
column: {
stacking: 'normal',
borderWidth: 0,
pointPadding: 0,
groupPadding: 0.15
},
},
legend: {
labelFormatter: function () {
if(this.data.length > 0) {
return this.data[0].category;
} else {
return this.name;
}
}
},
series: [{
colorByPoint: true,
name: 'Average Budget',
data: [18335359, 21651074, 20182963, 25577399, 7479975]
}]
});
This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Average Revenue' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Radial Bar Chart'. A 'Radial Bar Chart' is similar to a bar chart, but the y-axis is circular. This will be done using Javascript and HTML and will be saved in a .png file called 'average-revenue-of-all-sy.png'.
%%js
Highcharts.chart('container7', {
colors: ['#ff5500', '#D00000', '#800000', '#A00000', 'red'],
chart: {
type: 'column',
height:550,
width:500,
inverted: true,
polar: true
},
title: {
text: 'Average Revenue of All System Rating'
},
tooltip: {
outside: true
},
pane: {
size: '85%',
innerSize: '20%',
endAngle: 270
},
xAxis: {
tickInterval: 1,
labels: {
align: 'right',
useHTML: true,
allowOverlap: true,
step: 1,
y: 3,
style: {
fontSize: '13px'
}
},
lineWidth: 0,
categories: [
'R',
'PG',
'G',
'PG-13',
'NC-17'
]
},
yAxis: {
tickPositions:[0,13000000,25000000,37500000,50000000,63500000,75000000,87500000,100000000,113500000,125000000, 137500000,150000000],
labels: {
formatter: function() {
return this.value / 1000000 + 'M';
}},
crosshair: {
enabled: true,
color: '#333'
},
lineWidth: 0,
tickInterval: 25,
reversedStacks: false,
endOnTick: true,
showLastLabel: true
},
plotOptions: {
column: {
stacking: 'normal',
borderWidth: 0,
pointPadding: 0,
groupPadding: 0.15
}
},
legend: {
labelFormatter: function () {
if(this.data.length > 0) {
return this.data[0].category;
} else {
return this.name;
}
}
},
series: [{
colorByPoint: true,
name: 'Average Revenue',
data: [74913332,103158408,146873371,104420678,28266049,]
}]
});
This is the Javascript Script from Highcharts Libaray to visualize the data of the 'Average Ratings' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Radial Bar Chart'. A 'Polar Bar Chart' is similar to a bar chart, but the y-axis is circular. This will be saved in a .png file called 'average-rating-of-all-sy.png'.
%%js
Highcharts.chart('container8', {
colors: ['#ff5500', '#D00000', '#800000', '#A00000', 'red'],
chart: {
type: 'column',
height:550,
width:500,
inverted: true,
polar: true
},
title: {
text: 'Average Rating of All System Rating'
},
tooltip: {
outside: true
},
pane: {
size: '85%',
innerSize: '20%',
endAngle: 270
},
xAxis: {
tickInterval: 1,
labels: {
align: 'right',
useHTML: true,
allowOverlap: true,
step: 1,
y: 3,
style: {
fontSize: '13px'
}
},
lineWidth: 0,
categories: [
'R',
'PG',
'G',
'PG-13',
'NC-17'
]
},
yAxis: {
tickPositions:[0,.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5],
crosshair: {
enabled: true,
color: '#333'
},
lineWidth: 0,
tickInterval: 25,
reversedStacks: false,
endOnTick: true,
showLastLabel: true
},
plotOptions: {
column: {
stacking: 'normal',
borderWidth: 0,
pointPadding: 0,
groupPadding: 0.15
}
},
legend: {
labelFormatter: function () {
if(this.data.length > 0) {
return this.data[0].category;
} else {
return this.name;
}
}
},
series: [{
colorByPoint: true,
name: 'Average Rating',
data: [6.7,6.8,7.3,6.7,6.6]
}]
});
The Average Budgets of |
The Average Revenue of |
The Average Rating of |
This is the blueprint for creating the seventh visualization, Movies that made Profit. Highcharts will be used to create this graph.
The graph used for this visualzation is Highcharts Lollipop series which is found in the Highcharts Demos. Lollipop charts are variants of column charts, with a circle marker for the data value and a line extending to the axis.The first approach to this chart is by understanding the format of the script. This graph has two types of code HTML and Javascript. JupyterLab has magic commands that supports HTML and Javascript, it uses the '%' syntax element for magics.
Before the HTML and Javascript are scripted a dataframe needs to be made to extract all the movies that made profit in the action and adventure genre. The two variable needed from the parent dataframe '_all_drama_info1_' is the name and profit made in integer.
This is the 'Drama_DataFrame' dataframe.
Drama_DataFrame
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | Worldwide_Gross | Worldwide_Gross_x | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Hugo | Nov 23, 2011 | Drama | PG | 180000000.0 | $180,000,000 | 73864507 | $73,864,507 | 111900000.0 | $111,900,000 | 180047784 | $180,047,784 | 47784.0 | $47,784 | 18004778 | 18,004,778 | 126.0 | 7.5 | Paramount Pictures | Asa Butterfield | Martin Scorsese | John Logan |
| 1 | The Wolfman | Feb 12, 2010 | Drama | R | 150000000.0 | $150,000,000 | 62189884 | $62,189,884 | 77800000.0 | $77,800,000 | 142634358 | $142,634,358 | -7365642.0 | $-7,365,642 | 14263436 | 14,263,436 | NaN | 5.8 | NaN | Benicio Del Toro | Joe Johnston | Andrew Kevin Walker |
| 2 | Gravity | Oct 4, 2013 | Drama | PG-13 | 110000000.0 | $110,000,000 | 274092705 | $274,092,705 | 449100000.0 | $449,100,000 | 693698673 | $693,698,673 | 583698673.0 | $583,698,673 | 69369867 | 69,369,867 | 91.0 | 7.7 | Warner Bros. | Sandra Bullock | Alfonso Cuarón | Alfonso Cuarón |
| 3 | Django Unchained | Dec 25, 2012 | Drama | R | 100000000.0 | $100,000,000 | 162805434 | $162,805,434 | 262600000.0 | $262,600,000 | 449948323 | $449,948,323 | 349948323.0 | $349,948,323 | 44994832 | 44,994,832 | 165.0 | 8.4 | The Weinstein Company | Jamie Foxx | Quentin Tarantino | Quentin Tarantino |
| 4 | Sing | Dec 21, 2016 | Drama | PG-13 | 75000000.0 | $75,000,000 | 270329045 | $270,329,045 | 363800000.0 | $363,800,000 | 634454789 | $634,454,789 | 559454789.0 | $559,454,789 | 63445479 | 63,445,479 | 98.0 | 7.1 | TriStar Pictures | Lorraine Bracco | Richard Baskin | Dean Pitchford |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 301 | A Dirty Shame | September 24, 2004 | Drama | NC-17 | 15000000.0 | $15,000,000 | 1339668 | $1,339,668 | 574498.0 | $574,498 | 1914166 | $1,914,166 | -13085834.0 | $-13,085,834 | 191417 | 191,417 | 84.0 | 5.1 | Killer Films | Suzanne Shepherd | John Waters | John Waters |
| 302 | Young Adam | April 16, 2004 | Drama | NC-17 | 6400000.0 | $6,400,000 | 767373 | $767,373 | 1794447.0 | $1,794,447 | 2561820 | $2,561,820 | -3838180.0 | $-3,838,180 | 256182 | 256,182 | 98.0 | 6.4 | Recorded Picture Company | Tilda Swinton | David Mackenzie | \tDavid Mackenzie |
| 303 | Whore 1991 | October 4, 1991 | Drama | NC-17 | 50000.0 | $50,000 | 0 | $0 | 0.0 | $0 | 1008404 | $1,008,404 | 958404.0 | $958,404 | 100840 | 100,840 | 80.0 | 5.5 | Cheap Date | Theresa Russell | Ken Russell | Deborah Dalton |
| 304 | Ma Mère | May 13, 2005 | Drama | NC-17 | 3259572.0 | $3,259,572 | 71616 | $71,616 | 950532.0 | $950,532 | 1022148 | $1,022,148 | -2237424.0 | $-2,237,424 | 102215 | 102,215 | 110.0 | 5.0 | Gemini Films | Louis Garrel | Christophe Honoré | Christophe Honoré |
| 305 | Law of Desire | April 3, 1987 | Drama | NC-17 | 612072.0 | $612,072 | 0 | $0 | 0.0 | $0 | 1470809 | $1,470,809 | 858737.0 | $858,737 | 147081 | 147,081 | 82.0 | 7.1 | El Deseo | Antonio Banderas | Pedro Almodóvar | Pedro Almodóvar |
306 rows × 22 columns
Getting the 'Profit' of each 'R-rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe.
sum_r = []
for i in profit_int:
if i < 0: continue
else: sum_r.append(i)
print(sum_r)
[349948323, 307567189, 24154026, 326398492, 316350619, 19966854, 82112435, 530998101, 13147416, 129558438, 54735925, 9898681, 8554727, 17017873, 26604054, 8270399, 318266710, 25358392, 23262783, 7859167, 23830713, 34913, 31043521, 45178935, 60133905, 12417298, 69233867, 3765283, 12499242, 12636004, 222016, 53273049, 36954520, 17033227, 35669037, 20251930, 14610760, 14131551, 9295324, 8153415, 88390, 4328516, 19282640, 12744931, 15566240, 4438911, 156309, 294448, 2669782, 48766923, 68711836, 14718173, 1851683, 556082, 1500000, 2000000]
Getting the 'Toal Profit' of each 'R-rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe. The reason whyt the 'Total Profit' is reapeated is based on the mount of R-rated movies in the 'Drama_DataFrame' dataframe is beacuse it will be used below in the Javascript graph below
var_r = []
for i in profit_int: var_r.append(sum(sum_r))
print(var_r)
[3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978]
Getting the 'Profit' of each 'NC-17 rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe.
sum_nc17 = []
for i in profit_int4:
if i < 0: continue
else: sum_nc17.append(i)
print(sum_nc17)
[13912841, 4856268, 8404, 257845, 659312, 18912216, 89410061, 121165, 52091915, 13912841, 15465835, 307113, 13912841, 15390895, 15566240, 1315026, 256669, 201120004, 50167430, 2311944, 13912841, 2548651, 16283563, 3664240, 1038916, 8000000, 18912216, 94673038, 34897711, 401802, 50167430, 3546453, 958404, 858737]
Getting the 'Toal Profit' of each 'NC-17 rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe. The reason whyt the 'Total Profit' is reapeated is based on the mount of NC-17 rated movies in the 'Drama_DataFrame' dataframe is beacuse it will be used below in the Javascript graph below
var_nc17 = []
for i in profit_int4: var_nc17.append(sum(sum_nc17))
print(var_nc17)
[759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867, 759820867]
Getting the 'Profit' of each 'PG-rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe.
sum_pg = []
for i in profit_int1:
if i < 0: continue
else: sum_pg.append(i)
print(sum_pg)
[47784, 59068724, 284604712, 72678948, 70975239, 10531500, 4609597, 36918287, 447351353, 70986904, 285937718, 176601214, 33102988, 26696000, 35694916, 4344615, 6741732, 74830111, 10948425, 120587063, 34605762, 32973297, 69137047, 62667874, 83269971, 120036382, 81120329, 3835130, 118582776, 3101815, 48954968, 5164458, 107956187, 31440294, 12815212, 150297525, 21856053, 104285432, 28716963, 7423752, 108052686, 544368315, 42892670, 3943124, 71808942, 20000000]
Getting the 'Toal Profit' of each 'PG-rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe. The reason whyt the 'Total Profit' is reapeated is based on the mount of PG-rated movies in the 'Drama_DataFrame' dataframe is beacuse it will be used below in the Javascript graph below
var_pg = []
for i in profit_int1: var_pg.append(sum(sum_pg))
print(var_pg)
[3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794]
Getting the 'Profit' of each 'PG-13 rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe.
sum_pg13 = []
for i in profit_int3:
if i < 0: continue
else: sum_pg13.append(i)
print(sum_pg13)
[583698673, 559454789, 77551594, 35552675, 163591522, 129748880, 58660270, 22004627, 156127894, 4478084, 122498338, 129590606, 78809717, 136567581, 60143987, 49309093, 217276928, 26721826, 29802928, 132552290, 167618160, 38984536, 66050951, 15059418, 188120004, 117033509, 71633833, 41540205, 4847480, 57917283, 40282881, 188265198, 2281732, 57086711, 317522294, 21028230, 36545707, 40506120, 113955898, 5601987, 44168692, 20044909, 20069303, 20909437, 11477345, 67356170, 51076141, 51603136, 21556959, 27087044, 72831866, 12971021, 23787727, 29964656, 10369708, 143806510, 36699612, 13945682, 1205034, 12698355, 33185884, 4152584, 3478400, 1927779]
Getting the 'Toal Profit' of each 'PG-13 rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe. The reason whyt the 'Total Profit' is reapeated is based on the mount of PG-13 rated movies in the 'Drama_DataFrame' dataframe is beacuse it will be used below in the Javascript graph below
var_pg13 = []
for i in profit_int3: var_pg13.append(sum(sum_pg13))
print(var_pg13)
[5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393]
Getting the 'Profit' of each 'G-rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe.
sum_g = []
for i in profit_int2:
if i < 0: continue
else: sum_g.append(i)
print(sum_g)
[1711143, 11587135, 58693537, 418656843, 43947950, 12469621, 35099643, 255500000, 216100000, 1250000, 3851000, 58985708, 7657973, 58491516, 293281000, 278014195, 30482317, 941214868, 267142000, 55071636, 37707417, 23794409, 52500000, 5850377, 10300000]
Getting the 'Toal Profit' of each 'G-rated movie' in the Drama Genre from the 'Drama_DataFrame' dataframe. The reason whyt the 'Total Profit' is reapeated is based on the mount of G-rated movies in the 'Drama_DataFrame' dataframe is beacuse it will be used below in the Javascript graph below
var_g = []
for i in profit_int2: var_g.append(sum(sum_g))
print(var_g)
[3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288]
Using a for loop to put the name and profit of all the R-rated movies in html code which will be copied and pasted in the cell below.
for x in range(len(name)):
print("[ '",name[x],"'",',', profit_int[x],'],')
[ ' Django Unchained ' , 349948323 ], [ ' Gone Girl ' , 307567189 ], [ ' Priest ' , 24154026 ], [ ' Fifty Shades Darker ' , 326398492 ], [ ' Fifty Shades Freed ' , 316350619 ], [ ' Crimson Peak ' , 19966854 ], [ ' Zero Dark Thirty ' , 82112435 ], [ ' Fifty Shades of Grey ' , 530998101 ], [ ' The Master ' , 13147416 ], [ ' Flight ' , 129558438 ], [ ' The Ides of March ' , 54735925 ], [ ' Nocturnal Animals ' , 9898681 ], [ ' The Water Diviner ' , 8554727 ], [ ' For Colored Girls ' , 17017873 ], [ ' The Debt ' , 26604054 ], [ ' Let Me In ' , 8270399 ], [ ' Black Swan ' , 318266710 ], [ ' Ex Machina ' , 25358392 ], [ ' Room ' , 23262783 ], [ ' If Beale Street Could Talk ' , 7859167 ], [ ' Arbitrage ' , 23830713 ], [ ' Stoker ' , 34913 ], [ ' Carol ' , 31043521 ], [ ' Quartet ' , 45178935 ], [ ' Hereditary ' , 60133905 ], [ ' Melancholia ' , 12417298 ], [ ' Manchester by the Sea ' , 69233867 ], [ ' We Need to Talk About Kevin ' , 3765283 ], [ ' Addicted ' , 12499242 ], [ ' Mommy ' , 12636004 ], [ ' Take Shelter ' , 222016 ], [ ' Boyhood ' , 53273049 ], [ ' The Witch ' , 36954520 ], [ ' Margin Call ' , 17033227 ], [ ' Whiplash ' , 35669037 ], [ ' Before Midnight ' , 20251930 ], [ ' Silent House ' , 14610760 ], [ ' Winter's Bone ' , 14131551 ], [ ' The Florida Project ' , 9295324 ], [ ' We Are Your Friends ' , 8153415 ], [ ' Locke ' , 88390 ], [ ' Knock Knock ' , 4328516 ], [ ' Buried ' , 19282640 ], [ ' Unsane ' , 12744931 ], [ ' Blue Valentine ' , 15566240 ], [ ' Martha Marcy May Marlene ' , 4438911 ], [ ' Palo Alto ' , 156309 ], [ ' Sound of My Voice ' , 294448 ], [ ' A Ghost Story ' , 2669782 ], [ ' Ordinary People ' , 48766923 ], [ ' Fame ' , 68711836 ], [ ' Endless Love ' , 14718173 ], [ ' Ghost Story ' , 1851683 ], [ ' Zoot Suit ' , 556082 ], [ ' Rich and Famous ' , 1500000 ], [ ' Raggedy Man ' , 2000000 ],
Using a for loop to put the name and profit of all the NC-17 rated movies in html code which will be copied and pasted in the cell below.
for x in range(len(name4)):
print("[ '",name4[x],"'",',', profit_int4[x],'],')
[ ' Shame ' , 13912841 ], [ ' Matador ' , 4856268 ], [ ' Whore ' , 8404 ], [ ' Tokyo Decadence ' , 257845 ], [ ' Wide Sargasso Sea ' , 659312 ], [ ' Kids ' , 18912216 ], [ ' Crash ' , 89410061 ], [ ' The Dreamers ' , 121165 ], [ ' Lust, Caution ' , 52091915 ], [ ' Shame ' , 13912841 ], [ ' Blue Is the Warmest Colour ' , 15465835 ], [ ' The Dreamers ' , 307113 ], [ ' Shame ' , 13912841 ], [ ' Blue Is the Warmest Colour ' , 15390895 ], [ ' Blue Valentine ' , 15566240 ], [ ' Two Girls and a Guy ' , 1315026 ], [ ' Elles ' , 256669 ], [ ' Hell ' , 201120004 ], [ ' Se, jie ' , 50167430 ], [ ' The Evil Dead ' , 2311944 ], [ ' Shame ' , 13912841 ], [ ' Arabian Nights ' , 2548651 ], [ ' Natural Born Killers ' , 16283563 ], [ ' Clerks ' , 3664240 ], [ ' Bad Lieutenant ' , 1038916 ], [ ' Beyond the Valley of the Dolls ' , 8000000 ], [ ' Kids ' , 18912216 ], [ ' Crash ' , 94673038 ], [ ' Last Tango in Paris ' , 34897711 ], [ ' Pink Flamingos ' , 401802 ], [ ' Lust, Caution ' , 50167430 ], [ ' Happiness 1998 ' , 3546453 ], [ ' Whore 1991 ' , 958404 ], [ ' Law of Desire ' , 858737 ],
Using a for loop to put the name and profit of all the PG-rated movies in html code which will be copied and pasted in the cell below.
for x in range(len(name1)):
print("[ '",name1[x],"'",',', profit_int1[x],'],')
[ ' Hugo ' , 47784 ], [ ' Dolphin Tale ' , 59068724 ], [ ' Wonder ' , 284604712 ], [ ' The Last Song ' , 72678948 ], [ ' War Room ' , 70975239 ], [ ' The Lunchbox ' , 10531500 ], [ ' Somewhere in Time ' , 4609597 ], [ ' Urban Cowboy ' , 36918287 ], [ ' Cinderella ' , 447351353 ], [ ' War Room ' , 70986904 ], [ ' Wonder ' , 285937718 ], [ ' Little Women ' , 176601214 ], [ ' Overcomer ' , 33102988 ], [ ' The Jazz Singer ' , 26696000 ], [ ' A Walk to Remember ' , 35694916 ], [ ' Tuck Everlasting ' , 4344615 ], [ ' Dreamer ' , 6741732 ], [ ' The Lake House ' , 74830111 ], [ ' Akeelah and the Bee ' , 10948425 ], [ ' Bridge to Terabithia ' , 120587063 ], [ ' August Rush ' , 34605762 ], [ ' Fireproof ' , 32973297 ], [ ' The Last Song ' , 69137047 ], [ ' God's Not Dead ' , 62667874 ], [ ' Mr. Holland's Opus ' , 83269971 ], [ ' Phenomenon ' , 120036382 ], [ ' Contact ' , 81120329 ], [ ' The Spanish Prisoner ' , 3835130 ], [ ' Sense and Sensibility ' , 118582776 ], [ ' The Secret of Roan Inish ' , 3101815 ], [ ' The Remains of the Day ' , 48954968 ], [ ' Pure Country ' , 5164458 ], [ ' Forever Young ' , 107956187 ], [ ' A River Runs Through It ' , 31440294 ], [ ' Honeysuckle Rose ' , 12815212 ], [ ' Resurrection ' , 150297525 ], [ ' Taps ' , 21856053 ], [ ' On Golden Pond ' , 104285432 ], [ ' Absence of Malice ' , 28716963 ], [ ' The Night the Lights Went Out in Georgia ' , 7423752 ], [ ' Rocky III ' , 108052686 ], [ ' Tex ' , 544368315 ], [ ' Staying Alive ' , 42892670 ], [ ' Tender Mercies ' , 3943124 ], [ ' Footloose ' , 71808942 ], [ ' The Natural ' , 20000000 ],
Using a for loop to put the name and profit of all the PG-13 rated movies in html code which will be copied and pasted in the cell below.
for x in range(len(name3)):
print("[ '",name3[x],"'",',', profit_int3[x],'],')
[ ' Gravity ' , 583698673 ], [ ' Sing ' , 559454789 ], [ ' Contagion ' , 77551594 ], [ ' Burlesque ' , 35552675 ], [ ' Creed II ' , 163591522 ], [ ' The Post ' , 129748880 ], [ ' Hereafter ' , 58660270 ], [ ' Anna Karenina ' , 22004627 ], [ ' Arrival ' , 156127894 ], [ ' Charlie St. Cloud ' , 4478084 ], [ ' Bridge of Spies ' , 122498338 ], [ ' The Impossible ' , 129590606 ], [ ' Water for Elephants ' , 78809717 ], [ ' Creed ' , 136567581 ], [ ' The Rite ' , 60143987 ], [ ' Collateral Beauty ' , 49309093 ], [ ' True Grit ' , 217276928 ], [ ' The Tree of Life ' , 26721826 ], [ ' The Longest Ride ' , 29802928 ], [ ' Step Up Revolution ' , 132552290 ], [ ' The Vow ' , 167618160 ], [ ' The Age of Adaline ' , 38984536 ], [ ' Safe Haven ' , 66050951 ], [ ' The Best of Me ' , 15059418 ], [ ' The Help ' , 188120004 ], [ ' Dear John ' , 117033509 ], [ ' The Lucky One ' , 71633833 ], [ ' The Giver ' , 41540205 ], [ ' Draft Day ' , 4847480 ], [ ' Rings ' , 57917283 ], [ ' Fences ' , 40282881 ], [ ' Me Before You ' , 188265198 ], [ ' The Light Between Oceans ' , 2281732 ], [ ' The Book Thief ' , 57086711 ], [ ' A Quiet Place ' , 317522294 ], [ ' Beastly ' , 21028230 ], [ ' The Roommate ' , 36545707 ], [ ' Remember Me ' , 40506120 ], [ ' The Woman in Black ' , 113955898 ], [ ' Country Strong ' , 5601987 ], [ ' One Day ' , 44168692 ], [ ' Suffragette ' , 20044909 ], [ ' The Perks of Being a Wallflower ' , 20069303 ], [ ' Project Almanac ' , 20909437 ], [ ' Wish Upon ' , 11477345 ], [ ' If I Stay ' , 67356170 ], [ ' Brooklyn ' , 51076141 ], [ ' Everything, Everything ' , 51603136 ], [ ' Mud ' , 21556959 ], [ ' Amour ' , 27087044 ], [ ' Ouija: Origin of Evil ' , 72831866 ], [ ' Black or White ' , 12971021 ], [ ' The Bye Bye Man ' , 23787727 ], [ ' Gifted ' , 29964656 ], [ ' The Words ' , 10369708 ], [ ' Lights Out ' , 143806510 ], [ ' Still Alice ' , 36699612 ], [ ' Before I Fall ' , 13945682 ], [ ' Rabbit Hole ' , 1205034 ], [ ' Ida ' , 12698355 ], [ ' Courageous ' , 33185884 ], [ ' Mustang ' , 4152584 ], [ ' Like Crazy ' , 3478400 ], [ ' Another Earth ' , 1927779 ],
Using a for loop to put the name and profit of all the G-rated movies in html code which will be copied and pasted in the cell below.
for x in range(len(name2)):
print("[ '",name2[x],"'",',', profit_int2[x],'],')
[ ' A Sunday in the Country ' , 1711143 ], [ ' Prancer ' , 11587135 ], [ ' The Rookie ' , 58693537 ], [ ' Beauty and the Beast 1991 ' , 418656843 ], [ ' The Little Rascals ' , 43947950 ], [ ' Ramona and Beezus ' , 12469621 ], [ ' The Black Stallion ' , 35099643 ], [ ' The Hunchback of Notre Drame ' , 255500000 ], [ ' Babe ' , 216100000 ], [ ' Pollyanna ' , 1250000 ], [ ' Lassie Come Home ' , 3851000 ], [ ' Charlotte's Web ' , 58985708 ], [ ' Kit Kittredge: An American Girl ' , 7657973 ], [ ' The Rookie ' , 58491516 ], [ ' The Secret Garden ' , 293281000 ], [ ' The Sound of Music ' , 278014195 ], [ ' The Tale of Despereaux ' , 30482317 ], [ ' The Lion King 1994 ' , 941214868 ], [ ' Bambi 1942 ' , 267142000 ], [ ' My Fair Lady 1964 ' , 55071636 ], [ ' Hachiko: A Dog's Story ' , 37707417 ], [ ' Giant ' , 23794409 ], [ ' The Ten Commandments 1966 ' , 52500000 ], [ ' The Quiet Man ' , 5850377 ], [ ' Three Cions in the Fountain ' , 10300000 ],
This is the HTML Script from Highcharts Libaray to visualize the data of the of the Total Profit of each System Rating of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column and Line series'. This will be done using Javascript and HTML below and will be saved in an .png file.
%%HTML
<script src="https://code.highcharts.com/highcharts.js"></script>
<script src="https://code.highcharts.com/modules/exporting.js"></script>
<script src="https://code.highcharts.com/modules/accessibility.js"></script>
<script src="https://code.highcharts.com/themes/sunset.js"></script>
<script src="https://code.highcharts.com/modules/export-data.js"></script>
<script src="https://code.highcharts.com/modules/column.js"></script>
<figure class="highcharts-figure">
<div id="-"></div>
<p class="highcharts-description">
</p>
</figure>
%%js
function dollarFormat(x) {
return '$' + Highcharts.numberFormat(x, 0, '.', ',');
}
var colors = Highcharts.getOptions().colors;
Highcharts.chart('-', {
chart: {
type: 'column',
inverted: false,
height: 450,
width: 1100,
},
accessibility: {
series: {
descriptionFormatter: function (series) {
return series.type === 'line' ?
series.name + ', ' + dollarFormat(series.points[0].y) :
series.name + ' grant amounts, bar series with ' +
series.points.length + ' bars.';
}
},
point: {
valuePrefix: '$'
},
keyboardNavigation: {
seriesNavigation: {
mode: 'serialize'
}
}
},
title: {
text: 'Total Net Profit of each System Rating in the Drama Genere',
margin: 35
},
subtitle: {
text: 'There are five System Ratings: R-rated| G-rated| PG-rated| PG-13 rated| NC-17 rated '
},
xAxis: {
visible: false,
accessibility: {
description: 'Grant applicants',
rangeDescription: ''
}
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
yAxis: [{
min: 0,
max: 900000000,
step: 250000000,
labels: {
format: '${text}'
},
title: {
text: 'Movies Profit'
},
gridLineWidth: 1
}, {
accessibility: {
description: 'System Ratigs Category Totals'
},
opposite: true,
min: 0,
max: 7000000000,
step: 1000000000,
gridLineWidth: 0,
labels: {
format: '${text}',
style: {
color: '#8F6666'
}
},
title: {
text: 'System Ratigs Category Total',
style: {
color: '#8F6666'
}
}
}],
credits: {
enabled: false
},
plotOptions: {
column: {
keys: ['name', 'y'],
grouping: false,
pointPadding: 0.1,
groupPadding: 0,
tooltip: {
headerFormat: '<span style="font-size: 10px">' +
'<span style="color:{point.color}">\u25CF</span> ' +
'{series.name}</span><br/>',
pointFormat: '{point.name}: <b>${point.y:,.0f}</b><br/>'
}
},
line: {
yAxis: 1,
lineWidth: 5,
accessibility: {
exposeAsGroupOnly: true
},
marker: {
enabled: false
},
enableMouseTracking: false,
linkedTo: ':previous',
dataLabels: {
enabled: true,
verticalAlign: 'bottom',
style: {
color: '#757575',
fontWeight: 'normal'
},
formatter: function () {
if (this.point === this.series.points[Math.floor(
this.series.points.length / 2
)]) {
return 'Total: $' + Highcharts.numberFormat(this.y, 0);
}
}
}
}
},
responsive: {
rules: [{
condition: {
maxWidth: 400
},
chartOptions: {
chart: {
spacingLeft: 3,
spacingRight: 5
},
yAxis: [{}, {
visible: false
}]
}
}]
},
series: [{
name: 'System Rating R',
color: '#ff0000',
borderColor: '#A59273',
borderWidth: 1,
data: [
[ ' Django Unchained ' , 349948323 ],
[ ' Gone Girl ' , 307567189 ],
[ ' Priest ' , 24154026 ],
[ ' Fifty Shades Darker ' , 326398492 ],
[ ' Fifty Shades Freed ' , 316350619 ],
[ ' Crimson Peak ' , 19966854 ],
[ ' Zero Dark Thirty ' , 82112435 ],
[ ' The Master ' , 13147416 ],
[ ' Flight ' , 129558438 ],
[ ' The Ides of March ' , 54735925 ],
[ ' Nocturnal Animals ' , 9898681 ],
[ ' The Water Diviner ' , 8554727 ],
[ ' For Colored Girls ' , 17017873 ],
[ ' The Debt ' , 26604054 ],
[ ' Let Me In ' , 8270399 ],
[ ' Black Swan ' , 318266710 ],
[ ' Ex Machina ' , 25358392 ],
[ ' Room ' , 23262783 ],
[ ' If Beale Street Could Talk ' , 7859167 ],
[ ' Arbitrage ' , 23830713 ],
[ ' Stoker ' , 34913 ],
[ ' Carol ' , 31043521 ],
[ ' Quartet ' , 45178935 ],
[ ' Hereditary ' , 60133905 ],
[ ' Melancholia ' , 12417298 ],
[ ' Manchester by the Sea ' , 69233867 ],
[ ' We Need to Talk About Kevin ' , 3765283 ],
[ ' Addicted ' , 12499242 ],
[ ' Mommy ' , 12636004 ],
[ ' Take Shelter ' , 222016 ],
[ ' Boyhood ' , 53273049 ],
[ ' The Witch ' , 36954520 ],
[ ' Margin Call ' , 17033227 ],
[ ' Whiplash ' , 35669037 ],
[ ' Before Midnight ' , 20251930 ],
[ ' Silent House ' , 14610760 ],
[ ' Winter\'s Bone ' , 14131551 ],
[ ' The Florida Project ' , 9295324 ],
[ ' We Are Your Friends ' , 8153415 ],
[ ' Locke ' , 88390 ],
[ ' Knock Knock ' , 4328516 ],
[ ' Buried ' , 19282640 ],
[ ' Unsane ' , 12744931 ],
[ ' Blue Valentine ' , 15566240 ],
[ ' Martha Marcy May Marlene ' , 4438911 ],
[ ' Palo Alto ' , 156309 ],
[ ' Sound of My Voice ' , 294448 ],
[ ' A Ghost Story ' , 2669782 ],
[ ' Ordinary People ' , 48766923 ],
[ ' Fame ' , 68711836 ],
[ ' Endless Love ' , 14718173 ],
[ ' Ghost Story ' , 1851683 ],
[ ' Zoot Suit ' , 556082 ],
[ ' Rich and Famous ' , 1500000 ],
[ ' Raggedy Man ' , 2000000 ],
]
}, {
type: 'line',
name: 'System Rating R',
data: [
3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978,
3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978,
3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978,
3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978,
3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978,
3278073978, 3278073978, 3278073978
],
color: '#ff1919'
}, {
name: 'System Rating NC-17',
color: '#d61111',
data: [
[ ' Shame ' , 13912841 ],
[ ' Matador ' , 4856268 ],
[ ' Whore ' , 8404 ],
[ ' Tokyo Decadence ' , 257845 ],
[ ' Wide Sargasso Sea ' , 659312 ],
[ ' Kids ' , 18912216 ],
[ ' Crash ' , 89410061 ],
[ ' The Dreamers ' , 121165 ],
[ ' Lust, Caution ' , 52091915 ],
[ ' Shame ' , 13912841 ],
[ ' Blue Is the Warmest Colour ' , 15465835 ],
[ ' The Dreamers ' , 307113 ],
[ ' Shame ' , 13912841 ],
[ ' Blue Is the Warmest Colour ' , 15390895 ],
[ ' Blue Valentine ' , 15566240 ],
[ ' Two Girls and a Guy ' , 1315026 ],
[ ' Elles ' , 256669 ],
[ ' Se, jie ' , 50167430 ],
[ ' The Evil Dead ' , 2311944 ],
[ ' Shame ' , 13912841 ],
[ ' Arabian Nights ' , 2548651 ],
[ ' Natural Born Killers ' , 16283563 ],
[ ' Clerks ' , 3664240 ],
[ ' Bad Lieutenant ' , 1038916 ],
[ ' Beyond the Valley of the Dolls ' , 8000000 ],
[ ' Kids ' , 18912216 ],
[ ' Crash ' , 94673038 ],
[ ' Last Tango in Paris ' , 34897711 ],
[ ' Pink Flamingos ' , 401802 ],
[ ' Lust, Caution ' , 50167430 ],
[ ' Happiness 1998 ' , 3546453 ],
[ ' Whore 1991 ' , 958404 ],
[ ' Law of Desire ' , 858737 ],
],
pointStart: 59
}, {
type: 'line',
name: 'System Rating NC-17',
data: [
759820867, 759820867, 759820867, 759820867, 759820867, 759820867,
759820867, 759820867, 759820867, 759820867, 759820867, 759820867,
759820867, 759820867, 759820867, 759820867, 759820867, 759820867,
759820867, 759820867, 759820867, 759820867, 759820867, 759820867,
759820867, 759820867, 759820867, 759820867, 759820867, 759820867,
759820867, 759820867, 759820867, 759820867
],
pointStart: 59,
color: '#d61111'
}, {
name: 'System Rating PG',
color: '#a10505',
data: [
[ ' Hugo ' , 47784 ],
[ ' Dolphin Tale ' , 59068724 ],
[ ' Wonder ' , 284604712 ],
[ ' The Last Song ' , 72678948 ],
[ ' War Room ' , 70975239 ],
[ ' The Lunchbox ' , 10531500 ],
[ ' Somewhere in Time ' , 4609597 ],
[ ' Urban Cowboy ' , 36918287 ],
[ ' Cinderella ' , 447351353 ],
[ ' War Room ' , 70986904 ],
[ ' Wonder ' , 285937718 ],
[ ' Little Women ' , 176601214 ],
[ ' Overcomer ' , 33102988 ],
[ ' The Jazz Singer ' , 26696000 ],
[ ' A Walk to Remember ' , 35694916 ],
[ ' Tuck Everlasting ' , 4344615 ],
[ ' Dreamer ' , 6741732 ],
[ ' The Lake House ' , 74830111 ],
[ ' Akeelah and the Bee ' , 10948425 ],
[ ' Bridge to Terabithia ' , 120587063 ],
[ ' August Rush ' , 34605762 ],
[ ' Fireproof ' , 32973297 ],
[ ' The Last Song ' , 69137047 ],
[ ' God\'s Not Dead ' , 62667874 ],
[ ' Mr. Holland\'s Opus ' , 83269971 ],
[ ' Phenomenon ' , 120036382 ],
[ ' Contact ' , 81120329 ],
[ ' The Spanish Prisoner ' , 3835130 ],
[ ' Sense and Sensibility ' , 118582776 ],
[ ' The Secret of Roan Inish ' , 3101815 ],
[ ' The Remains of the Day ' , 48954968 ],
[ ' Pure Country ' , 5164458 ],
[ ' Forever Young ' , 107956187 ],
[ ' A River Runs Through It ' , 31440294 ],
[ ' Honeysuckle Rose ' , 12815212 ],
[ ' Resurrection ' , 150297525 ],
[ ' Taps ' , 21856053 ],
[ ' On Golden Pond ' , 104285432 ],
[ ' Absence of Malice ' , 28716963 ],
[ ' The Night the Lights Went Out in Georgia ' , 7423752 ],
[ ' Rocky III ' , 108052686 ],
[ ' Tex ' , 544368315 ],
[ ' Staying Alive ' , 42892670 ],
[ ' Tender Mercies ' , 3943124 ],
[ ' Footloose ' , 71808942 ],
[ ' The Natural ' , 20000000 ],
],
pointStart: 96
}, {
type: 'line',
name: 'System Rating PG',
data: [
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794,
],
pointStart: 96,
color: '#a10505'
}, {
name: 'System Rating PG\-13',
color: '#7a2f2f',
data: [
[ ' Gravity ' , 583698673 ],
[ ' Sing ' , 559454789 ],
[ ' Contagion ' , 77551594 ],
[ ' Burlesque ' , 35552675 ],
[ ' Creed II ' , 163591522 ],
[ ' The Post ' , 129748880 ],
[ ' Hereafter ' , 58660270 ],
[ ' Anna Karenina ' , 22004627 ],
[ ' Arrival ' , 156127894 ],
[ ' Charlie St. Cloud ' , 4478084 ],
[ ' Bridge of Spies ' , 122498338 ],
[ ' The Impossible ' , 129590606 ],
[ ' Water for Elephants ' , 78809717 ],
[ ' Creed ' , 136567581 ],
[ ' The Rite ' , 60143987 ],
[ ' Collateral Beauty ' , 49309093 ],
[ ' True Grit ' , 217276928 ],
[ ' The Tree of Life ' , 26721826 ],
[ ' The Longest Ride ' , 29802928 ],
[ ' Step Up Revolution ' , 132552290 ],
[ ' The Vow ' , 167618160 ],
[ ' The Age of Adaline ' , 38984536 ],
[ ' Safe Haven ' , 66050951 ],
[ ' The Best of Me ' , 15059418 ],
[ ' The Help ' , 188120004 ],
[ ' Dear John ' , 117033509 ],
[ ' The Lucky One ' , 71633833 ],
[ ' The Giver ' , 41540205 ],
[ ' Draft Day ' , 4847480 ],
[ ' Rings ' , 57917283 ],
[ ' Fences ' , 40282881 ],
[ ' Me Before You ' , 188265198 ],
[ ' The Light Between Oceans ' , 2281732 ],
[ ' The Book Thief ' , 57086711 ],
[ ' A Quiet Place ' , 317522294 ],
[ ' Beastly ' , 21028230 ],
[ ' The Roommate ' , 36545707 ],
[ ' Remember Me ' , 40506120 ],
[ ' The Woman in Black ' , 113955898 ],
[ ' Country Strong ' , 5601987 ],
[ ' One Day ' , 44168692 ],
[ ' Suffragette ' , 20044909 ],
[ ' The Perks of Being a Wallflower ' , 20069303 ],
[ ' Project Almanac ' , 20909437 ],
[ ' Wish Upon ' , 11477345 ],
[ ' If I Stay ' , 67356170 ],
[ ' Brooklyn ' , 51076141 ],
[ ' Everything, Everything ' , 51603136 ],
[ ' Mud ' , 21556959 ],
[ ' Amour ' , 27087044 ],
[ ' Ouija: Origin of Evil ' , 72831866 ],
[ ' Black or White ' , 12971021 ],
[ ' The Bye Bye Man ' , 23787727 ],
[ ' Gifted ' , 29964656 ],
[ ' The Words ' , 10369708 ],
[ ' Lights Out ' , 143806510 ],
[ ' Still Alice ' , 36699612 ],
[ ' Before I Fall ' , 13945682 ],
[ ' Rabbit Hole ' , 1205034 ],
[ ' Ida ' , 12698355 ],
[ ' Courageous ' , 33185884 ],
[ ' Mustang ' , 4152584 ],
[ ' Like Crazy ' , 3478400 ],
[ ' Another Earth ' , 1927779 ]
],
pointStart: 150
}, {
type: 'line',
name: 'System Rating PG\-13',
data: [
5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,
5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,
5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,
5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,
5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,
5102398393, 5102398393, 5102398393, 5102398393,
],
pointStart: 150,
color: '#7a2f2f',
},{
name: 'System Rating G',
color: '#4d0909',
borderWidth: 1,
data: [
[ ' A Sunday in the Country ' , 1711143 ],
[ ' Prancer ' , 11587135 ],
[ ' The Rookie ' , 58693537 ],
[ ' Beauty and the Beast 1991 ' , 418656843 ],
[ ' The Little Rascals ' , 43947950 ],
[ ' Ramona and Beezus ' , 12469621 ],
[ ' The Black Stallion ' , 35099643 ],
[ ' The Hunchback of Notre Drame ' , 255500000 ],
[ ' Babe ' , 216100000 ],
[ ' Pollyanna ' , 1250000 ],
[ ' Lassie Come Home ' , 3851000 ],
[ ' Charlotte\'s Web ' , 58985708 ],
[ ' Kit Kittredge: An American Girl ' , 7657973 ],
[ ' The Rookie ' , 58491516 ],
[ ' The Secret Garden ' , 293281000 ],
[ ' The Sound of Music ' , 278014195 ],
[ ' The Tale of Despereaux ' , 30482317 ],
[ ' Bambi 1942 ' , 267142000 ],
[ ' My Fair Lady 1964 ' , 55071636 ],
[ ' Hachiko: A Dog\'s Story ' , 37707417 ],
[ ' Giant ' , 23794409 ],
[ ' The Ten Commandments 1966 ' , 52500000 ],
[ ' The Quiet Man ' , 5850377 ],
[ ' Three Cions in the Fountain ' , 10300000 ],
],
pointStart:216
}, {
type: 'line',
name: 'System Rating G',
data: [
3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288,
3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288,
3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288,
3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288
],
pointStart: 216,
color: '#4d0909'
}]
});
This is the blueprint for creating the seventh visualization, Movies that made Profit. Highcharts will be used to create this graph.
Blueprint:
The graph used for this visualzation is Highcharts Lollipop series which is found in the Highcharts Demos. Lollipop charts are variants of column charts, with a circle marker for the data value and a line extending to the axis.The first approach to this chart is by understanding the format of the script. This graph has two types of code HTML and Javascript. JupyterLab has magic commands that supports HTML and Javascript, it uses the '%' syntax element for magics.
Before the HTML and Javascript are scripted a dataframe needs to be made to extract all the movies that made profit in the action and adventure genre. The two variable needed from the parent dataframe 'all_drama_info1' is the name and profit made in integer.
HTML Section:
To start off the magic command '%%HTML' should be scripted first in the cell. The main scripted lines that will be used to create highchart graph through out this project are;
For this particular graph this script will be used to get lollipop affect;
To close off the HTML script, this script is used and to indetify and differenate this claa from other, the div id has to be named, and if thei graph is being used more than once the div id needs to have a different name at all times or it will not work;
Javascript Section:
To start off, the magic command '%%JS' is used to script javascript in the jupyterlab cell. The javascript section will have comments and explaination, but for futher information go to Highcharts Demos.The normal layout is horizontal but for this graph its going to be vertical. The main javascript used to create this lollipop graph is; name: ' The Sound of Music ' low: 2303109231 The name variable is where the name of the movies is put and the low variable is the amount of profit made in integer is put.
This is the 'Drama_DataFrame' dataframe.
Drama_DataFrame
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | Worldwide_Gross | Worldwide_Gross_x | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Hugo | Nov 23, 2011 | Drama | PG | 180000000.0 | $180,000,000 | 73864507 | $73,864,507 | 111900000.0 | $111,900,000 | 180047784 | $180,047,784 | 47784.0 | $47,784 | 18004778 | 18,004,778 | 126.0 | 7.5 | Paramount Pictures | Asa Butterfield | Martin Scorsese | John Logan |
| 1 | The Wolfman | Feb 12, 2010 | Drama | R | 150000000.0 | $150,000,000 | 62189884 | $62,189,884 | 77800000.0 | $77,800,000 | 142634358 | $142,634,358 | -7365642.0 | $-7,365,642 | 14263436 | 14,263,436 | NaN | 5.8 | NaN | Benicio Del Toro | Joe Johnston | Andrew Kevin Walker |
| 2 | Gravity | Oct 4, 2013 | Drama | PG-13 | 110000000.0 | $110,000,000 | 274092705 | $274,092,705 | 449100000.0 | $449,100,000 | 693698673 | $693,698,673 | 583698673.0 | $583,698,673 | 69369867 | 69,369,867 | 91.0 | 7.7 | Warner Bros. | Sandra Bullock | Alfonso Cuarón | Alfonso Cuarón |
| 3 | Django Unchained | Dec 25, 2012 | Drama | R | 100000000.0 | $100,000,000 | 162805434 | $162,805,434 | 262600000.0 | $262,600,000 | 449948323 | $449,948,323 | 349948323.0 | $349,948,323 | 44994832 | 44,994,832 | 165.0 | 8.4 | The Weinstein Company | Jamie Foxx | Quentin Tarantino | Quentin Tarantino |
| 4 | Sing | Dec 21, 2016 | Drama | PG-13 | 75000000.0 | $75,000,000 | 270329045 | $270,329,045 | 363800000.0 | $363,800,000 | 634454789 | $634,454,789 | 559454789.0 | $559,454,789 | 63445479 | 63,445,479 | 98.0 | 7.1 | TriStar Pictures | Lorraine Bracco | Richard Baskin | Dean Pitchford |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 301 | A Dirty Shame | September 24, 2004 | Drama | NC-17 | 15000000.0 | $15,000,000 | 1339668 | $1,339,668 | 574498.0 | $574,498 | 1914166 | $1,914,166 | -13085834.0 | $-13,085,834 | 191417 | 191,417 | 84.0 | 5.1 | Killer Films | Suzanne Shepherd | John Waters | John Waters |
| 302 | Young Adam | April 16, 2004 | Drama | NC-17 | 6400000.0 | $6,400,000 | 767373 | $767,373 | 1794447.0 | $1,794,447 | 2561820 | $2,561,820 | -3838180.0 | $-3,838,180 | 256182 | 256,182 | 98.0 | 6.4 | Recorded Picture Company | Tilda Swinton | David Mackenzie | \tDavid Mackenzie |
| 303 | Whore 1991 | October 4, 1991 | Drama | NC-17 | 50000.0 | $50,000 | 0 | $0 | 0.0 | $0 | 1008404 | $1,008,404 | 958404.0 | $958,404 | 100840 | 100,840 | 80.0 | 5.5 | Cheap Date | Theresa Russell | Ken Russell | Deborah Dalton |
| 304 | Ma Mère | May 13, 2005 | Drama | NC-17 | 3259572.0 | $3,259,572 | 71616 | $71,616 | 950532.0 | $950,532 | 1022148 | $1,022,148 | -2237424.0 | $-2,237,424 | 102215 | 102,215 | 110.0 | 5.0 | Gemini Films | Louis Garrel | Christophe Honoré | Christophe Honoré |
| 305 | Law of Desire | April 3, 1987 | Drama | NC-17 | 612072.0 | $612,072 | 0 | $0 | 0.0 | $0 | 1470809 | $1,470,809 | 858737.0 | $858,737 | 147081 | 147,081 | 82.0 | 7.1 | El Deseo | Antonio Banderas | Pedro Almodóvar | Pedro Almodóvar |
306 rows × 22 columns
Getting the 'Cost' of the Drama Genered movies that had losses in the 'Drama_DataFrame' dataframe
bud_loss = []
for i,x in enumerate(Drama_DataFrame.Profit):
if x < 0:bud_loss.append(Drama_DataFrame.Production_Budget[i])
print(bud_loss)
[150000000.0, 68000000.0, 60000000.0, 50000000.0, 50000000.0, 40000000.0, 40000000.0, 35000000.0, 31000000.0, 30000000.0, 27500000.0, 25000000.0, 22000000.0, 21000000.0, 20000000.0, 18000000.0, 18000000.0, 18000000.0, 16000000.0, 16000000.0, 15000000.0, 15000000.0, 13000000.0, 10000000.0, 7000000.0, 5000000.0, 4500000.0, 4357373.0, 4000000.0, 1000000.0, 1000000.0, 250000.0, 5100000.0, 72000000.0, 65000000.0, 9000000.0, 11000000.0, 45000000.0, 15000000.0, 10000000.0, 27000000.0, 25000000.0, 34000000.0, 15000000.0, 28300000.0, 8000000.0, 9000000.0, 15000000.0, 5000000.0, 4500000.0, 8000000.0, 16000000.0, 45000000.0, 5000000.0, 2734384.0, 35446775.0, 8600000.0, 18000000.0, 4400000.0, 17000000.0, 26000000.0, 6500000.0, 22000000.0, 90000000.0, 17000000.0, 300000.0, 3000000.0, 45000000.0, 10000000.0, 19000000.0, 1000000.0, 4700000.0, 3000000.0, 700000.0, 3200000.0, 1300000.0, 15000000.0, 6400000.0, 3259572.0]
Getting the 'Revenue' of the Drama Genered movies that had losses in the 'Drama_DataFrame' dataframe
rev_loss = []
for i,x in enumerate(Drama_DataFrame.Profit):
if x < 0:rev_loss.append(Drama_DataFrame.Worldwide_Gross[i])
print(rev_loss)
[142634358, 54462971, 47818913, 41642166, 26387039, 16340767, 31124367, 24687524, 15826984, 16481405, 15815509, 6792768, 4065020, 5046038, 3727746, 14189810, 7680250, 7719630, 8217571, 7585011, 11173718, 528731, 11831131, 2179623, 382946, 2821010, 1027760, 1200000, 679482, 852399, 354836, 62375, 534816, 37306334, 43545364, 3438735, 8526288, 35656130, 3987768, 7025496, 14859394, 10769960, 32255440, 2819485, 14920781, 3281232, 6668025, 199078, 4786789, 2044892, 2400000, 1705908, 20350754, 496059, 1022148, 195494, 1025228, 8721243, 40300, 10015449, 636796, 2447576, 9171289, 69131860, 10015449, 108998, 592861, 37750754, 4659110, 1236844, 205569, 2094302, 2783535, 103093, 690872, 627287, 1914166, 2561820, 1022148]
Getting the 'Amount of Money Lost' of the Drama Genered movies that had losses in the 'Drama_DataFrame' dataframe
money_loss = []
for i,x in enumerate(Drama_DataFrame.Profit):
if x < 0:money_loss.append(Drama_DataFrame.Profit[i])
print(money_loss)
[-7365642.0, -13537029.0, -12181087.0, -8357834.0, -23612961.0, -23659233.0, -8875633.0, -10312476.0, -15173016.0, -13518595.0, -11684491.0, -18207232.0, -17934980.0, -15953962.0, -16272254.0, -3810190.0, -10319750.0, -10280370.0, -7782429.0, -8414989.0, -3826282.0, -14471269.0, -1168869.0, -7820377.0, -6617054.0, -2178990.0, -3472240.0, -3157373.0, -3320518.0, -147601.0, -645164.0, -187625.0, -4565184.0, -34693666.0, -21454636.0, -5561265.0, -2473712.0, -9343870.0, -11012232.0, -2974504.0, -12140606.0, -14230040.0, -1744560.0, -12180515.0, -13379219.0, -4718768.0, -2331975.0, -14800922.0, -213211.0, -2455108.0, -5600000.0, -14294092.0, -24649246.0, -4503941.0, -1712236.0, -35251281.0, -7574772.0, -9278757.0, -4359700.0, -6984551.0, -25363204.0, -4052424.0, -12828711.0, -20868140.0, -6984551.0, -191002.0, -2407139.0, -7249246.0, -5340890.0, -17763156.0, -794431.0, -2605698.0, -216465.0, -596907.0, -2509128.0, -672713.0, -13085834.0, -3838180.0, -2237424.0]
Getting the 'Names' of the Drama Genered movies that had losses in the 'Drama_DataFrame' dataframe
name_loss = []
for i,x in enumerate(Drama_DataFrame.Profit):
if x < 0:name_loss.append(Drama_DataFrame.Movie[i])
print(name_loss)
['The Wolfman', 'Downsizing', 'Trouble with the Curve', 'Dream House', 'Upside Down', 'Paranoia', 'Victor Frankenstein', 'Biutiful', 'Extraordinary Measures', 'The Space Between Us', 'Anonymous', 'Tulip Fever', 'Stone', 'The Beaver', 'By the Sea', 'Labor Day', 'Midnight Special', 'Miss Sloane', 'The Homesman', 'The Immigrant', 'Never Let Me Go', 'The Reluctant Fundamentalist', 'Chloe', 'Coriolanus', 'Hesher', 'Everything Must Go', 'Maggie', 'Anna', 'Stake Land', 'I Origins', 'The Invitation', 'The Canyons', 'Cattle Annie and Little Britches', 'The Majestic', 'We Are Marshall', 'The Ultimate Gift', 'What If...', 'The Indian in the Cupboard', 'Fluke', 'Three Wishes', 'Music of the Heart', 'Gettysburg', 'The Age of Innocence', 'Newsies', 'Ragtime', 'Looker', 'Six Weeks', 'Five Days One Summer', 'Eddie and the Cruisers', 'Testament', 'Table for Five', 'Man, Woman and Child', 'Showgirls', 'Bent', 'Ma mère', 'La traviata', 'Little Dorrit', 'The Secret Garden', 'Through the Olive Trees', 'A Little Princess', 'One from the Heart', 'The Hand', 'Pennies from Heaven', 'Babe: Pig in the City', 'A Little Princess', 'Before the Wrath', 'Miracle of Marcelino', 'Showgirls', 'Killer Joe', 'Queen of Hearts', 'Man Bites Dog', 'Nymphomaniac: Vol. I', 'Frontier(s)', 'Chained', 'The Big Feast', 'Orgazmo', 'A Dirty Shame', 'Young Adam', 'Ma Mère']
Getting the index of the movies that have a Budget that is between $0 to $8 million
stor1 = []
for i,x in enumerate(bud_loss):
if x <= 8000000:stor1.append(i)
len(stor1)
27
Using the index from the variable 'stor1' to get the name of the movies that have a Budget that is between $0 to $8 million
n1 = []
for i in stor1:
n1.append(name_loss[i])
print(n1)
['Hesher', 'Everything Must Go', 'Maggie', 'Anna', 'Stake Land', 'I Origins', 'The Invitation', 'The Canyons', 'Cattle Annie and Little Britches', 'Looker', 'Eddie and the Cruisers', 'Testament', 'Table for Five', 'Bent', 'Ma mère', 'Through the Olive Trees', 'The Hand', 'Before the Wrath', 'Miracle of Marcelino', 'Man Bites Dog', 'Nymphomaniac: Vol. I', 'Frontier(s)', 'Chained', 'The Big Feast', 'Orgazmo', 'Young Adam', 'Ma Mère']
Using the index from the variable 'stor1' to get the budget of the movies that have a Budget that is between $0 to $8 million
b1 = []
for i in stor1:
b1.append(bud_loss[i])
print(b1)
[7000000.0, 5000000.0, 4500000.0, 4357373.0, 4000000.0, 1000000.0, 1000000.0, 250000.0, 5100000.0, 8000000.0, 5000000.0, 4500000.0, 8000000.0, 5000000.0, 2734384.0, 4400000.0, 6500000.0, 300000.0, 3000000.0, 1000000.0, 4700000.0, 3000000.0, 700000.0, 3200000.0, 1300000.0, 6400000.0, 3259572.0]
Using the index from the variable 'stor1' to get the revenue of the movies that have a Budget that is between $0 to $8 million
r1 = []
for i in stor1:
r1.append(rev_loss[i])
print(r1)
[382946, 2821010, 1027760, 1200000, 679482, 852399, 354836, 62375, 534816, 3281232, 4786789, 2044892, 2400000, 496059, 1022148, 40300, 2447576, 108998, 592861, 205569, 2094302, 2783535, 103093, 690872, 627287, 2561820, 1022148]
Using the index from the variable 'stor1' to get the amount of money lost of the movies that have a Budget that is between $0 to $8 million
l1 = []
for i in stor1:
l1.append(money_loss[i])
print(l1)
[-6617054.0, -2178990.0, -3472240.0, -3157373.0, -3320518.0, -147601.0, -645164.0, -187625.0, -4565184.0, -4718768.0, -213211.0, -2455108.0, -5600000.0, -4503941.0, -1712236.0, -4359700.0, -4052424.0, -191002.0, -2407139.0, -794431.0, -2605698.0, -216465.0, -596907.0, -2509128.0, -672713.0, -3838180.0, -2237424.0]
Getting the index of the movies that have a Budget that is between $8 to $21 million
stor2 = []
for i,x in enumerate(bud_loss):
if 8000000 < x <= 21000000:stor2.append(i)
len(stor2)
26
Using the index from the variable 'stor2' to get the name of the movies that have a Budget that is between $8 to $21 million
n2 = []
for i in stor2:
n2.append(name_loss[i])
print(n2)
['The Beaver', 'By the Sea', 'Labor Day', 'Midnight Special', 'Miss Sloane', 'The Homesman', 'The Immigrant', 'Never Let Me Go', 'The Reluctant Fundamentalist', 'Chloe', 'Coriolanus', 'The Ultimate Gift', 'What If...', 'Fluke', 'Three Wishes', 'Newsies', 'Six Weeks', 'Five Days One Summer', 'Man, Woman and Child', 'Little Dorrit', 'The Secret Garden', 'A Little Princess', 'A Little Princess', 'Killer Joe', 'Queen of Hearts', 'A Dirty Shame']
Using the index from the variable 'stor2' to get the budget of the movies that have a Budget that is between $8 to $21 million
b2 = []
for i in stor2:
b2.append(bud_loss[i])
print(b2)
[21000000.0, 20000000.0, 18000000.0, 18000000.0, 18000000.0, 16000000.0, 16000000.0, 15000000.0, 15000000.0, 13000000.0, 10000000.0, 9000000.0, 11000000.0, 15000000.0, 10000000.0, 15000000.0, 9000000.0, 15000000.0, 16000000.0, 8600000.0, 18000000.0, 17000000.0, 17000000.0, 10000000.0, 19000000.0, 15000000.0]
Using the index from the variable 'stor2' to get the revenue of the movies that have a Budget that is between $8 to $21 million
r2 = []
for i in stor2:
r2.append(rev_loss[i])
print(r2)
[5046038, 3727746, 14189810, 7680250, 7719630, 8217571, 7585011, 11173718, 528731, 11831131, 2179623, 3438735, 8526288, 3987768, 7025496, 2819485, 6668025, 199078, 1705908, 1025228, 8721243, 10015449, 10015449, 4659110, 1236844, 1914166]
Using the index from the variable 'stor2' to get the amount of money lost of the movies that have a Budget that is between $8 to $21 million
l2 = []
for i in stor2:
l2.append(money_loss[i])
print(l2)
[-15953962.0, -16272254.0, -3810190.0, -10319750.0, -10280370.0, -7782429.0, -8414989.0, -3826282.0, -14471269.0, -1168869.0, -7820377.0, -5561265.0, -2473712.0, -11012232.0, -2974504.0, -12180515.0, -2331975.0, -14800922.0, -14294092.0, -7574772.0, -9278757.0, -6984551.0, -6984551.0, -5340890.0, -17763156.0, -13085834.0]
Getting the index of the movies that have a Budget that is above $21 million
stor3 = []
for i,x in enumerate(bud_loss):
if x > 21000000:stor3.append(i)
len(stor3)
26
Using the index from the variable 'stor3' to get the name of the movies that have a Budget that is above $21 million
n3 = []
for i in stor3:
n3.append(name_loss[i])
print(n3)
['The Wolfman', 'Downsizing', 'Trouble with the Curve', 'Dream House', 'Upside Down', 'Paranoia', 'Victor Frankenstein', 'Biutiful', 'Extraordinary Measures', 'The Space Between Us', 'Anonymous', 'Tulip Fever', 'Stone', 'The Majestic', 'We Are Marshall', 'The Indian in the Cupboard', 'Music of the Heart', 'Gettysburg', 'The Age of Innocence', 'Ragtime', 'Showgirls', 'La traviata', 'One from the Heart', 'Pennies from Heaven', 'Babe: Pig in the City', 'Showgirls']
Using the index from the variable 'stor3' to get the budget of the movies that have a Budget that is above $21 million
b3 = []
for i in stor3:
b3.append(bud_loss[i])
print(b3)
[150000000.0, 68000000.0, 60000000.0, 50000000.0, 50000000.0, 40000000.0, 40000000.0, 35000000.0, 31000000.0, 30000000.0, 27500000.0, 25000000.0, 22000000.0, 72000000.0, 65000000.0, 45000000.0, 27000000.0, 25000000.0, 34000000.0, 28300000.0, 45000000.0, 35446775.0, 26000000.0, 22000000.0, 90000000.0, 45000000.0]
Using the index from the variable 'stor3' to get the revenue of the movies that have a Budget that is above $21 million
r3 = []
for i in stor3:
r3.append(rev_loss[i])
print(r3)
[142634358, 54462971, 47818913, 41642166, 26387039, 16340767, 31124367, 24687524, 15826984, 16481405, 15815509, 6792768, 4065020, 37306334, 43545364, 35656130, 14859394, 10769960, 32255440, 14920781, 20350754, 195494, 636796, 9171289, 69131860, 37750754]
Using the index from the variable 'stor3' to get the amount of money lost of the movies that have a Budget that is above $21 million
l3 = []
for i in stor3:
l3.append(money_loss[i])
print(l3)
[-7365642.0, -13537029.0, -12181087.0, -8357834.0, -23612961.0, -23659233.0, -8875633.0, -10312476.0, -15173016.0, -13518595.0, -11684491.0, -18207232.0, -17934980.0, -34693666.0, -21454636.0, -9343870.0, -12140606.0, -14230040.0, -1744560.0, -13379219.0, -24649246.0, -35251281.0, -25363204.0, -12828711.0, -20868140.0, -7249246.0]
This is 'Part One' of the HTML Script from Highcharts Libaray to visualize the data of the Losses of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column Series'. This will be done using Javascript and HTML below and will be saved in an .png file.
%%HTML
<script src="https://code.highcharts.com/highcharts.js"></script>
<script src="https://code.highcharts.com/modules/exporting.js"></script>
<script src="https://code.highcharts.com/modules/export-data.js"></script>
<script src="https://code.highcharts.com/modules/accessibilty.js"></script>
<figure class="highcharts-figure">
<div id="w"></div>
<p class="highcharts-description">
</p>
</figure>
%%js
Highcharts.chart('w',{
chart:{
type:'column',
height:400,
width:700
},
title:{
text:'Movies That Did Not Make Any Profit'
},
xAxis:{
categories:['Hesher', 'Everything Must Go', 'Maggie', 'Anna', 'Stake Land', 'I Origins', 'The Invitation', 'The Canyons', 'Cattle Annie and Little Britches',
'Looker', 'Eddie and the Cruisers', 'Testament', 'Table for Five', 'Bent', 'Ma mère', 'Through the Olive Trees', 'The Hand', 'Before the Wrath',
'Miracle of Marcelino', 'Man Bites Dog', 'Nymphomaniac: Vol. I', 'Frontier(s)', 'Chained', 'The Big Feast', 'Orgazmo', 'Young Adam', 'Ma Mère'],
labels:{
enabled:false
}
},
credits:{
enabled:false
},
colors:['red','#900505','#FF5F84'],
yAxis:{
min:-6000000,
max:8000000,
step:500000,
},
series:[{
name:'Cost',
data:[7000000.0, 5000000.0, 4500000.0, 4357373.0, 4000000.0, 1000000.0, 1000000.0, 250000.0, 5100000.0, 8000000.0, 5000000.0,
4500000.0, 8000000.0, 5000000.0, 2734384.0, 4400000.0, 6500000.0, 300000.0, 3000000.0, 1000000.0, 4700000.0, 3000000.0,
700000.0, 3200000.0, 1300000.0, 6400000.0, 3259572.0 ]
},{
name:'Loss',
data:[-6617054.0, -2178990.0, -3472240.0, -3157373.0, -3320518.0, -147601.0, -645164.0, -187625.0, -4565184.0, -4718768.0,
-213211.0, -2455108.0, -5600000.0, -4503941.0, -1712236.0, -4359700.0, -4052424.0, -191002.0, -2407139.0, -794431.0,
-2605698.0, -216465.0, -596907.0, -2509128.0, -672713.0, -3838180.0, -2237424.0]
},{
name:'Revenue',
data:[382946, 2821010, 1027760, 1200000, 679482, 852399, 354836, 62375, 534816, 3281232, 4786789, 2044892, 2400000, 496059,
1022148, 40300, 2447576, 108998, 592861, 205569, 2094302, 2783535, 103093, 690872, 627287, 2561820, 1022148]
}]
});
This is 'Part Two' of the HTML Script from Highcharts Libaray to visualize the data of the Losses of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column Series'. This will be done using Javascript and HTML below and will be saved in an .png file.
%%HTML
<script src="https://code.highcharts.com/highcharts.js"></script>
<script src="https://code.highcharts.com/modules/exporting.js"></script>
<script src="https://code.highcharts.com/modules/export-data.js"></script>
<script src="https://code.highcharts.com/modules/accessibilty.js"></script>
<figure class="highcharts-figure">
<div id="f"></div>
<p class="highcharts-description">
</p>
</figure>
%%js
Highcharts.chart('f',{
chart:{
type:'column',
height:400,
width:700
},
title:{
text:'Movies That Did Not Make Any Profit'
},
xAxis:{
categories:['The Beaver', 'By the Sea', 'Labor Day', 'Midnight Special', 'Miss Sloane', 'The Homesman', 'The Immigrant', 'Never Let Me Go',
'The Reluctant Fundamentalist', 'Chloe', 'Coriolanus', 'The Ultimate Gift', 'What If...', 'Fluke', 'Three Wishes', 'Newsies',
'Six Weeks', 'Five Days One Summer', 'Man, Woman and Child', 'Little Dorrit', 'The Secret Garden', 'A Little Princess',
'A Little Princess', 'Killer Joe', 'Queen of Hearts', 'A Dirty Shame'
],
labels:{
enabled:false
}},
yAxis:{
labels:{
enabled:true
}
},
credits:{
enabled:false
},
colors:['red','#900505 ','#FF5F84'],
yAxis:{
min:-15000000,
max:25000000,
step:5000000,
},
series:[{
name:'Cost',
data: [21000000.0, 20000000.0, 18000000.0, 18000000.0, 18000000.0, 16000000.0, 16000000.0, 15000000.0, 15000000.0, 13000000.0, 10000000.0,
9000000.0, 11000000.0, 15000000.0, 10000000.0, 15000000.0, 9000000.0, 15000000.0, 16000000.0, 8600000.0, 18000000.0, 17000000.0,
17000000.0, 10000000.0, 19000000.0, 15000000.0 ]
},{
name:'Loss',
data:[-7365642.0, -13537029.0, -12181087.0, -8357834.0, -23612961.0, -23659233.0, -8875633.0, -10312476.0, -15173016.0, -13518595.0,
-11684491.0, -18207232.0, -17934980.0, -34693666.0, -21454636.0, -9343870.0, -12140606.0, -14230040.0, -1744560.0, -13379219.0,
-24649246.0, -35251281.0, -25363204.0, -12828711.0, -20868140.0, -7249246.0]
},{
name:'Revenue',
data:[5046038, 3727746, 14189810, 7680250, 7719630, 8217571, 7585011, 11173718, 528731, 11831131, 2179623, 3438735, 8526288, 3987768, 7025496,
2819485, 6668025, 199078, 1705908, 1025228, 8721243, 10015449, 10015449, 4659110, 1236844, 1914166]
}]
});
This is 'Part Three' of the HTML Script from Highcharts Libaray to visualize the data of the Losses of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Column Series'. This will be done using Javascript and HTML below and will be saved in an .png file.
%%HTML
<script src="https://code.highcharts.com/highcharts.js"></script>
<script src="https://code.highcharts.com/modules/exporting.js"></script>
<script src="https://code.highcharts.com/modules/export-data.js"></script>
<script src="https://code.highcharts.com/modules/accessibilty.js"></script>
<figure class="highcharts-figure">
<div id="k"></div>
<p class="highcharts-description">
</p>
</figure>
%%js
Highcharts.chart('k',{
chart:{
type:'column',
height:400,
width:700
},
title:{
text:'Movies That Did Not Make Any Profit'
},
xAxis:{
categories:['Downsizing', 'Trouble with the Curve', 'Dream House', 'Upside Down', 'Paranoia', 'Victor Frankenstein', 'Biutiful', 'Extraordinary Measures',
'The Space Between Us', 'Anonymous', 'Tulip Fever', 'Stone', 'The Majestic', 'We Are Marshall', 'The Indian in the Cupboard', 'Music of the Heart', 'Gettysburg',
'The Age of Innocence', 'Ragtime', 'Showgirls', 'La traviata', 'One from the Heart', 'Pennies from Heaven', 'Babe: Pig in the City', 'Showgirls'
],
labels:{
enabled:false
}},
yAxis:{
labels:{
enabled:true
}
},
credits:{
enabled:false
},
colors:['red','#900505 ','#FF5F84'],
yAxis:{
min:-35000000,
max:70000000,
step:500000,
},
series:[{
name:'Cost',
data: [68000000.0, 60000000.0, 50000000.0, 50000000.0, 40000000.0, 40000000.0, 35000000.0, 31000000.0, 30000000.0,
27500000.0, 25000000.0, 22000000.0, 72000000.0, 65000000.0, 45000000.0, 27000000.0, 25000000.0, 34000000.0, 28300000.0,
45000000.0, 35446775.0, 26000000.0, 22000000.0, 90000000.0, 45000000.0]
},{
name:'Loss',
data:[-13537029.0, -12181087.0, -8357834.0, -23612961.0, -23659233.0, -8875633.0, -10312476.0, -15173016.0, -13518595.0,
-11684491.0, -18207232.0, -17934980.0, -34693666.0, -21454636.0, -9343870.0, -12140606.0, -14230040.0, -1744560.0, -13379219.0,
-24649246.0, -35251281.0, -25363204.0, -12828711.0, -20868140.0, -7249246.0 ]
},{
name:'Revenue',
data:[54462971, 47818913, 41642166, 26387039, 16340767, 31124367, 24687524, 15826984, 16481405, 15815509, 6792768, 4065020, 37306334,
43545364, 35656130, 14859394, 10769960, 32255440, 14920781, 20350754, 195494, 636796, 9171289, 69131860, 37750754]
}]
});
This is the blueprint for creating the ninth and tenth visualization, Top 20 Highest Profitable Movies and the Top 20 Lowest Profitable Movies. Highcharts will be used to create these garphs.
Blueprint:
The graph used for these visualization is Highcharts 3D Cylinder Chart for the Top 20 Highest Profitable Movies and Highcharts Chart Rotation Chart Series for thr Top 20 lowest Profitable Movies, which is all found in the highcharts demos. The chart rotation chart series have an Alpha, Beta and Depth angle which can be adjusted to rotate the graph around.
The first approach to this chart is by understanding the format of the code script. Like the previous graph this chart is constructed by two different types of code HTML and Javascript. The HTML section is very similar to the previous graph but the only difference is this HTML code; This allows the graphs to be 3D and it also allows it to be cylinder. To close of two script is also similar to the previous script but with a unique name to it self.
This is for the cylinder graph;
This if for the 3D column graph;
The second approach is scripting the javascript section. The javascript section is very simple, brfore that there needs to be some data extracted from the parent dataframe. These are the elements needed to be extracted for this visualazation. For the Top 20 Highest Profitable Movies the name and amount of profit made is needed, for the Top 20 Lowest Profitable Movies the name and amount of profit made in integer of all the movies is needed in that category.
There is just one sub-section in each graph that are particular, the data sub-section in the series section. It is a body of sequenences of list that consist of the name and amount of profit made in integer.
This is from the first graph; ['Avatar|1st Highest',2351345279], ['The Sound of Music|2nd Highest', 2303109231], ['Black Panther|3rd Highest',1148258224]
This is from the second graph; [' Miami Vice|170th Highest', 28818556 ], [' Space Chimps|171th Highest', 28097693 ], [' The Tale of Despereaux|172th Highest', 26957280 ],
Using a for loop to put the top 20 highest profitable movie name and profit in html code which will be pasted in the cell below.
This is the 'Drama_DataFrame' dataframe.
Drama_DataFrame
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | Worldwide_Gross | Worldwide_Gross_x | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Hugo | Nov 23, 2011 | Drama | PG | 180000000.0 | $180,000,000 | 73864507 | $73,864,507 | 111900000.0 | $111,900,000 | 180047784 | $180,047,784 | 47784.0 | $47,784 | 18004778 | 18,004,778 | 126.0 | 7.5 | Paramount Pictures | Asa Butterfield | Martin Scorsese | John Logan |
| 1 | The Wolfman | Feb 12, 2010 | Drama | R | 150000000.0 | $150,000,000 | 62189884 | $62,189,884 | 77800000.0 | $77,800,000 | 142634358 | $142,634,358 | -7365642.0 | $-7,365,642 | 14263436 | 14,263,436 | NaN | 5.8 | NaN | Benicio Del Toro | Joe Johnston | Andrew Kevin Walker |
| 2 | Gravity | Oct 4, 2013 | Drama | PG-13 | 110000000.0 | $110,000,000 | 274092705 | $274,092,705 | 449100000.0 | $449,100,000 | 693698673 | $693,698,673 | 583698673.0 | $583,698,673 | 69369867 | 69,369,867 | 91.0 | 7.7 | Warner Bros. | Sandra Bullock | Alfonso Cuarón | Alfonso Cuarón |
| 3 | Django Unchained | Dec 25, 2012 | Drama | R | 100000000.0 | $100,000,000 | 162805434 | $162,805,434 | 262600000.0 | $262,600,000 | 449948323 | $449,948,323 | 349948323.0 | $349,948,323 | 44994832 | 44,994,832 | 165.0 | 8.4 | The Weinstein Company | Jamie Foxx | Quentin Tarantino | Quentin Tarantino |
| 4 | Sing | Dec 21, 2016 | Drama | PG-13 | 75000000.0 | $75,000,000 | 270329045 | $270,329,045 | 363800000.0 | $363,800,000 | 634454789 | $634,454,789 | 559454789.0 | $559,454,789 | 63445479 | 63,445,479 | 98.0 | 7.1 | TriStar Pictures | Lorraine Bracco | Richard Baskin | Dean Pitchford |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 301 | A Dirty Shame | September 24, 2004 | Drama | NC-17 | 15000000.0 | $15,000,000 | 1339668 | $1,339,668 | 574498.0 | $574,498 | 1914166 | $1,914,166 | -13085834.0 | $-13,085,834 | 191417 | 191,417 | 84.0 | 5.1 | Killer Films | Suzanne Shepherd | John Waters | John Waters |
| 302 | Young Adam | April 16, 2004 | Drama | NC-17 | 6400000.0 | $6,400,000 | 767373 | $767,373 | 1794447.0 | $1,794,447 | 2561820 | $2,561,820 | -3838180.0 | $-3,838,180 | 256182 | 256,182 | 98.0 | 6.4 | Recorded Picture Company | Tilda Swinton | David Mackenzie | \tDavid Mackenzie |
| 303 | Whore 1991 | October 4, 1991 | Drama | NC-17 | 50000.0 | $50,000 | 0 | $0 | 0.0 | $0 | 1008404 | $1,008,404 | 958404.0 | $958,404 | 100840 | 100,840 | 80.0 | 5.5 | Cheap Date | Theresa Russell | Ken Russell | Deborah Dalton |
| 304 | Ma Mère | May 13, 2005 | Drama | NC-17 | 3259572.0 | $3,259,572 | 71616 | $71,616 | 950532.0 | $950,532 | 1022148 | $1,022,148 | -2237424.0 | $-2,237,424 | 102215 | 102,215 | 110.0 | 5.0 | Gemini Films | Louis Garrel | Christophe Honoré | Christophe Honoré |
| 305 | Law of Desire | April 3, 1987 | Drama | NC-17 | 612072.0 | $612,072 | 0 | $0 | 0.0 | $0 | 1470809 | $1,470,809 | 858737.0 | $858,737 | 147081 | 147,081 | 82.0 | 7.1 | El Deseo | Antonio Banderas | Pedro Almodóvar | Pedro Almodóvar |
306 rows × 22 columns
Creating a list of all the 'Movies Profits' from 'Drama_DataFrame' dataframe.
profit_all = []
for i,x in enumerate(Drama_DataFrame.Profit):
profit_all.append(x)
Creating a list of all the 'Movies Names' from 'Drama_DataFrame' dataframe.
name_all = []
for i,x in enumerate(Drama_DataFrame.Movie):
name_all.append(x)
Creating a list of all the 'Movies System Ratings' from 'Drama_DataFrame' dataframe.
system_ratings =[]
for i,x in enumerate(Drama_DataFrame.Rating):
system_ratings.append(x)
Creating tuples consisting of the 'Movies Names, Profits and System Rating' put together which will then be in a list
all_to = []
for i,x in enumerate(profit_all):all_to.append((name_all[i],x,system_ratings[i]))
After creating the 'all_to' list, the list will be sorted by the 'Profit' of each movie going in decending order. And prinitng out the 'Top 20 Highest Profitable Movies'
all_to.sort(key=lambda i:i[1],reverse=True)
print(all_to[:20])
[('The Lion King 1994', 941214868.0, 'G'), ('Gravity', 583698673.0, 'PG-13'), ('Sing', 559454789.0, 'PG-13'), ('Tex', 544368315.0, 'PG'), ('Fifty Shades of Grey', 530998101.0, 'R'), ('Cinderella', 447351353.0, 'PG'), ('Beauty and the Beast 1991', 418656843.0, 'G'), ('Django Unchained', 349948323.0, 'R'), ('Fifty Shades Darker', 326398492.0, 'R'), ('Black Swan', 318266710.0, 'R'), ('A Quiet Place', 317522294.0, 'PG-13'), ('Fifty Shades Freed', 316350619.0, 'R'), ('Gone Girl', 307567189.0, 'R'), ('The Secret Garden', 293281000.0, 'G'), ('Wonder', 285937718.0, 'PG'), ('Wonder', 284604712.0, 'PG'), ('The Sound of Music', 278014195.0, 'G'), ('Bambi 1942', 267142000.0, 'G'), ('The Hunchback of Notre Drame', 255500000.0, 'G'), ('True Grit', 217276928.0, 'PG-13')]
Getting the 'Profit' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.
twenty_num = []
for i in all_to[:20]:twenty_num.append(i[1])
print(twenty_num)
[941214868.0, 583698673.0, 559454789.0, 544368315.0, 530998101.0, 447351353.0, 418656843.0, 349948323.0, 326398492.0, 318266710.0, 317522294.0, 316350619.0, 307567189.0, 293281000.0, 285937718.0, 284604712.0, 278014195.0, 267142000.0, 255500000.0, 217276928.0]
Getting the 'Names' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.
twenty_name = []
for i in all_to[:20]:twenty_name.append(i[0])
print(twenty_name)
['The Lion King 1994', 'Gravity', 'Sing', 'Tex', 'Fifty Shades of Grey', 'Cinderella', 'Beauty and the Beast 1991', 'Django Unchained', 'Fifty Shades Darker', 'Black Swan', 'A Quiet Place', 'Fifty Shades Freed', 'Gone Girl', 'The Secret Garden', 'Wonder', 'Wonder', 'The Sound of Music', 'Bambi 1942', 'The Hunchback of Notre Drame', 'True Grit']
Getting the 'System Ratigs' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.
twenty_rat = []
for i in all_to[:20]:twenty_rat.append(i[2])
print(twenty_rat)
['G', 'PG-13', 'PG-13', 'PG', 'R', 'PG', 'G', 'R', 'R', 'R', 'PG-13', 'R', 'R', 'G', 'PG', 'PG', 'G', 'G', 'G', 'PG-13']
After creating the 'all_to' list, the list will be sorted by the 'Profit' of each movie going in decending order. And prinitng out the 'Top 20 Lowest Profitable Movies'
all_to.sort(key=lambda i:i[1],reverse=True)
print(all_to[-99:-79])
[('Two Girls and a Guy', 1315026.0, 'NC-17'), ('Pollyanna', 1250000.0, 'G'), ('Rabbit Hole', 1205034.0, 'PG-13'), ('Bad Lieutenant', 1038916.0, 'NC-17'), ('Whore 1991', 958404.0, 'NC-17'), ('Law of Desire', 858737.0, 'NC-17'), ('Wide Sargasso Sea', 659312.0, 'NC-17'), ('Zoot Suit', 556082.0, 'R'), ('Pink Flamingos', 401802.0, 'NC-17'), ('The Dreamers', 307113.0, 'NC-17'), ('Sound of My Voice', 294448.0, 'R'), ('Tokyo Decadence', 257845.0, 'NC-17'), ('Elles', 256669.0, 'NC-17'), ('Take Shelter', 222016.0, 'R'), ('Palo Alto', 156309.0, 'R'), ('The Dreamers', 121165.0, 'NC-17'), ('Locke', 88390.0, 'R'), ('Hugo', 47784.0, 'PG'), ('Stoker', 34913.0, 'R'), ('Whore', 8404.0, 'NC-17')]
Getting the 'Profit' of the movies that are the 'Top 20 Lowest Profitable Movies' in the 'Drama_DataFrame' dataframe.
twenty_num1 = []
for i in all_to[-99:-79]:twenty_num1.append(i[1])
print(twenty_num1)
[1315026.0, 1250000.0, 1205034.0, 1038916.0, 958404.0, 858737.0, 659312.0, 556082.0, 401802.0, 307113.0, 294448.0, 257845.0, 256669.0, 222016.0, 156309.0, 121165.0, 88390.0, 47784.0, 34913.0, 8404.0]
Getting the 'Names' of the movies that are the 'Top 20 Lowest Profitable Movies' in the 'Drama_DataFrame' dataframe.
twenty_name1 = []
for i in all_to[-99:-79]:twenty_name1.append(i[0])
print(twenty_name1)
['Two Girls and a Guy', 'Pollyanna', 'Rabbit Hole', 'Bad Lieutenant', 'Whore 1991', 'Law of Desire', 'Wide Sargasso Sea', 'Zoot Suit', 'Pink Flamingos', 'The Dreamers', 'Sound of My Voice', 'Tokyo Decadence', 'Elles', 'Take Shelter', 'Palo Alto', 'The Dreamers', 'Locke', 'Hugo', 'Stoker', 'Whore']
Getting the 'System Ratigs' of the movies that are the 'Top 20 Lowest Profitable Movies' in the 'Drama_DataFrame' dataframe.
twenty_rat1 = []
for i in all_to[-99:-79]:twenty_rat1.append(i[2])
print(twenty_rat1)
['NC-17', 'G', 'PG-13', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'R', 'NC-17', 'NC-17', 'R', 'NC-17', 'NC-17', 'R', 'R', 'NC-17', 'R', 'PG', 'R', 'NC-17']
Matching colors based in HTML to the System Rating within the 'Top 20 Highest Profitable Movies', for the graph below.
color1 = []
for i in all_to[-99:-79]:
if i[2] == 'NC-17':color1.append("#DC143C")
if i[2] == 'R':color1.append("#8B0000")
if i[2] == 'PG':color1.append("#CD5C5C")
if i[2] == 'PG-13':color1.append("#FAB072")
if i[2] == 'G':color1.append("#A45A52")
print(color1)
['#DC143C', '#A45A52', '#FAB072', '#DC143C', '#DC143C', '#DC143C', '#DC143C', '#8B0000', '#DC143C', '#DC143C', '#8B0000', '#DC143C', '#DC143C', '#8B0000', '#8B0000', '#DC143C', '#8B0000', '#CD5C5C', '#8B0000', '#DC143C']
Matching colors based in HTML to the System Rating within the 'Top 20 Lowest Profitable Movies', for the graph below.
color2 = []
for i in all_to[:20]:
if i[2] == 'NC-17':color2.append("#DC143C")
if i[2] == 'R':color2.append("#8B0000")
if i[2] == 'PG':color2.append("#CD5C5C")
if i[2] == 'PG-13':color2.append("#FAB072")
if i[2] == 'G':color2.append("#A45A52")
print(color2)
['#A45A52', '#FAB072', '#FAB072', '#CD5C5C', '#8B0000', '#CD5C5C', '#A45A52', '#8B0000', '#8B0000', '#8B0000', '#FAB072', '#8B0000', '#8B0000', '#A45A52', '#CD5C5C', '#CD5C5C', '#A45A52', '#A45A52', '#A45A52', '#FAB072']
This is the HTML Script from Highcharts Libaray to visualize the data of the 'Top 20 Highest Profitable Movies' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a '3D Cylinder Series'. This will be done using Javascript and HTML below.
%%HTML
<script src="https://code.highcharts.com/highcharts.js"></script>
<script src="https://code.highcharts.com/highcharts-3d.js"></script>
<script src="https://code.highcharts.com/modules/cylinder.js"></script>
<script src="https://code.highcharts.com/modules/exporting.js"></script>
<script src="https://code.highcharts.com/modules/export-data.js"></script>
<script src="https://code.highcharts.com/modules/accessibility.js"></script>
<figure class="highcharts-figure">
<div id="pete1"></div>
</figure>
%%js
Highcharts.chart('pete1', {
chart: {
width:650,
height:500,
type: 'cylinder',
options3d: {
enabled: true,
alpha: 15,
beta: 15,
depth: 50,
viewDistance: 25
}
},
title: {
text: 'Top 20 Highest Profitable Movies'
},
plotOptions: {
series: {
depth: 25,
colorByPoint: false,
color: "#EC5800",
}
},
xAxis: {
categories:['The Lion King 1994', 'Gravity', 'Sing', 'Tex', 'Fifty Shades of Grey', 'Cinderella', 'Beauty and the Beast 1991', 'Django Unchained', 'Fifty Shades Darker',
'Black Swan', 'A Quiet Place', 'Fifty Shades Freed', 'Gone Girl', 'The Secret Garden', 'Wonder', 'Wonder', 'The Sound of Music', 'Bambi 1942',
'The Hunchback of Notre Drame', 'True Grit'],
labels: {
skew3d: true,
style: {
fontSize: '16px'
}
}
},
series: [{
data: [941214868.0, 583698673.0, 559454789.0, 544368315.0, 530998101.0, 447351353.0, 418656843.0, 349948323.0, 326398492.0, 318266710.0,
317522294.0, 316350619.0, 307567189.0, 293281000.0, 285937718.0, 284604712.0, 278014195.0, 267142000.0, 255500000.0, 217276928.0],
name: 'Profit',
showInLegend: false
}]
});
function showValues() {
document.getElementById('alpha-value').innerHTML = chart.options.chart.options3d.alpha;
document.getElementById('beta-value').innerHTML = chart.options.chart.options3d.beta;
document.getElementById('depth-value').innerHTML = chart.options.chart.options3d.depth;
}
// Activate the sliders
document.querySelectorAll('#sliders input').forEach(input => input.addEventListener('input', e => {
chart.options.chart.options3d[e.target.id] = parseFloat(e.target.value);
showValues();
chart.redraw(false);
}));
showValues();
This is the HTML Script from Highcharts Libaray to visualize the data of the 'Top 20 Lowest Profitable Movies' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a '3D Cylinder Series'. This will be done using Javascript and HTML below.
%%html
<script src="https://code.highcharts.com/highcharts.js"></script>
<script src="https://code.highcharts.com/highcharts-3d.js"></script>
<script src="https://code.highcharts.com/modules/cylinder.js"></script>
<script src="https://code.highcharts.com/modules/exporting.js"></script>
<script src="https://code.highcharts.com/modules/export-data.js"></script>
<script src="https://code.highcharts.com/modules/accessibility.js"></script>
<figure class="highcharts-figure">
<div id="pete2"></div>
</figure>
%%js
Highcharts.chart('pete2', {
chart: {
width:650,
height:500,
type: 'cylinder',
options3d: {
enabled: true,
alpha: 15,
beta: 15,
depth: 50,
viewDistance: 25
}
},
title: {
text: 'Top 20 Lowest Profitable Movies'
},
plotOptions: {
series: {
depth: 25,
colorByPoint: false,
color: "#DC143C",
}
},
xAxis: {
categories:['Two Girls and a Guy', 'Pollyanna', 'Rabbit Hole', 'Bad Lieutenant', 'Whore 1991', 'Law of Desire', 'Wide Sargasso Sea', 'Zoot Suit', 'Pink Flamingos',
'The Dreamers', 'Sound of My Voice', 'Tokyo Decadence', 'Elles', 'Take Shelter', 'Palo Alto', 'The Dreamers', 'Locke', 'Hugo', 'Stoker', 'Whore'],
labels: {
skew3d: true,
style: {
fontSize: '16px'
}
}
},
series: [{
data: [1315026.0, 1250000.0, 1205034.0, 1038916.0, 958404.0, 858737.0, 659312.0, 556082.0, 401802.0, 307113.0, 294448.0, 257845.0, 256669.0,
222016.0, 156309.0, 121165.0, 88390.0, 47784.0, 34913.0, 8404.0],
name: 'Profit',
showInLegend: false
}]
});
function showValues() {
document.getElementById('alpha-value').innerHTML = chart.options.chart.options3d.alpha;
document.getElementById('beta-value').innerHTML = chart.options.chart.options3d.beta;
document.getElementById('depth-value').innerHTML = chart.options.chart.options3d.depth;
}
// Activate the sliders
document.querySelectorAll('#sliders input').forEach(input => input.addEventListener('input', e => {
chart.options.chart.options3d[e.target.id] = parseFloat(e.target.value);
showValues();
chart.redraw(false);
}));
showValues();
This is the HTML Script from Highcharts Libaray to visualize the Percentage of each 'System rateing' in 'Top 20 Highest Profitable Movies' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML below.
%%HTML
<script src="https://code.highcharts.com/highcharts.js"></script>
<script src="https://code.highcharts.com/modules/exporting.js"></script>
<script src="https://code.highcharts.com/modules/export-data.js"></script>
<script src="https://code.highcharts.com/modules/accessibility.js"></script>
<figure class="highcharts-figure">
<div id="pete3"></div>
</figure>
%%js
Highcharts.chart('pete3', {
chart: {
width:650,
height:500,
styledMode: false,
plotBackgroundColor: null,
plotBorderWidth: null,
plotShadow: false,
type: 'pie'
},
title: {
text: 'Top 20 Highest Profitable Movies'
},
tooltip: {
pointFormat: '{series.name}: <b>{point.percentage:.0f}%</b>'
},
accessibility: {
point: {
valueSuffix: '%'
}
},
plotOptions: {
pie: {
allowPointSelect: true,
cursor: 'pointer',
dataLabels: {
enabled: true,
format: '<b>{point.name}</b>: {point.percentage:.0f} %'
},
showInLegend: true
}
},
series: [{
name: 'System Rating',
colorByPoint: true,
colors: ['#f24a0c','#b30707','#edBa66','#bf6849'],
data: [{
name: 'G',
y: 30,
selected: true,
}, {
name: 'PG',
y: 20,
sliced: true,
}, {
name: 'R',
y: 30,
sliced: true,
selected: true
}, {
name: 'PG-13',
y: 20
}]
}]
});
This is the HTML Script from Highcharts Libaray to visualize the Percentage of each 'System rateing' in 'Top 20 Lowest Profitable Movies' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, using a 'Pie Chart'. This will be done using Javascript and HTML below.
%%HTML
<script src="https://code.highcharts.com/highcharts.js"></script>
<script src="https://code.highcharts.com/modules/exporting.js"></script>
<script src="https://code.highcharts.com/modules/export-data.js"></script>
<script src="https://code.highcharts.com/modules/accessibility.js"></script>
<figure class="highcharts-figure">
<div id="pete4"></div>
</figure>
%%js
Highcharts.chart('pete4', {
chart: {
width:650,
height:500,
styledMode: false,
plotBackgroundColor: null,
plotBorderWidth: null,
plotShadow: false,
type: 'pie'
},
title: {
text: 'Top 20 Lowest Profitable Movies'
},
tooltip: {
pointFormat: '{series.name}: <b>{point.percentage:.0f}%</b>'
},
accessibility: {
point: {
valueSuffix: '%'
}
},
plotOptions: {
pie: {
allowPointSelect: true,
cursor: 'pointer',
dataLabels: {
enabled: true,
format: '<b>{point.name}</b>: {point.percentage:.0f} %'
},
showInLegend: true
}
},
series: [{
name: 'System Rating',
colorByPoint: true,
colors: ['#ff004d','#850a33','#d97e99','#9c6671','#e6a3ba'],
data: [{
name: 'Nc-17',
y: 55
}, {
name: 'PG',
y: 5,
sliced: true,
selected: true
}, {
name: 'R',
y: 30
}, {
name: 'PG-13',
y: 5,
selected: true
}, {
name: 'G',
y: 5,
sliced: true,
selected: true
}]
}]
});
Using a for loop to put the top 20 lowest profitable movie name and profit in html code which will be pasted in the cell below
Drama_DataFrameJust because a movie has made the most profit dosen't mean it is more profitable than a movie that made less profit and or revenue. These were the five main factors that were used to better understand what movie was the most successful. The cost, the revenue, the profit, the gross profit margin percentage which is how much of the revenue is each movie walking away in percentageand the profit percenatge by cost which is comparing the profit with the cost in percentage.
This is the blueprint for creating the elevnth, twelith, thirthith and fourthith visualization. Highcharts will be used to create these graphs.
Blueprint:
The other factors are the two different divs id for the combination chart; <div id="x1" display; inline-block>
series:[{
type:'column',
color:"#111E6C",
name:'Profit',
data:
[2351345279,2303109231,1148258224, 1135772799,1015392272,984846267,912044677,894039076,
890069413,878000000]
},{
type:'column'
color:Highcharts.getOptions().colors[1],
name:'Revenue',
data:[2776345279,2366000000,1348258224,
1305772799,1215392272,1234846267,1027044677,1104039076,1140069413,1078000000]
},{
type:'spline',
color:'gold',
name:'Cost',
data:[425000000,62890769,200000000,170000000,200000000,250000000,115000000,210000000,
250000000,200000000],
The other factors are the two different divs id for the column chart;
<div id="n1" display; inline-block></div>
<div id="xo1" display; inline-block></div>
The Gross Profit Margin Percentage and the Profit Percentage by Cost will all be analyzed using highcharts combination chart but with only column. The two column will be The Percentage of Gross Profit Margin and Profit by Cost of each movie.
The javascript section for the columns chart consist of the category subsection which is the names of all the movies.
categories:['Zootopia| 11th Highest',
'Finding Nemo| 12th Highest',
'The Jungle Book| 13th Highest',
'The Lord of the Rings: The Fellowship of the Ring| 14th Highest',
'Ice Age: Dawn of the Dinosaurs| 15th Highest',
'Star Wars Ep. III: Revenge of the Sith| 16th Highest',
'The Hobbit: The Battle of the Five Armies| 17th Highest',
'The Twilight Saga: Breaking Dawn, Part 2| 18th Highest',
'Inside Out| 19th Highest',
'Deadpool 2| 20th Highest']
It also consist of the series subsection that has two factors, the type column which is where the gross profit margin percentage and profit by cost is scripted.
series:[{
type:'column',
color:'#3FE0D0',
name:'Gross Profit Margin Percentage',
data:[85.0, 97.0, 85.0, 87.0, 84.0, 80.0, 89.0, 81.0, 78.0, 81.0]
},{
type:'column',
color:'#008081',
name:'Profit Percentage by Cost',
data:[553.0, 3662.0, 574.0, 668.0, 508.0, 394.0, 793.0, 426.0, 356.0, 439.0]
}]
This is the 'Drama_DataFrame' dataframe.
Drama_DataFrame
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | Worldwide_Gross | Worldwide_Gross_x | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Hugo | Nov 23, 2011 | Drama | PG | 180000000.0 | $180,000,000 | 73864507 | $73,864,507 | 111900000.0 | $111,900,000 | 180047784 | $180,047,784 | 47784.0 | $47,784 | 18004778 | 18,004,778 | 126.0 | 7.5 | Paramount Pictures | Asa Butterfield | Martin Scorsese | John Logan |
| 1 | The Wolfman | Feb 12, 2010 | Drama | R | 150000000.0 | $150,000,000 | 62189884 | $62,189,884 | 77800000.0 | $77,800,000 | 142634358 | $142,634,358 | -7365642.0 | $-7,365,642 | 14263436 | 14,263,436 | NaN | 5.8 | NaN | Benicio Del Toro | Joe Johnston | Andrew Kevin Walker |
| 2 | Gravity | Oct 4, 2013 | Drama | PG-13 | 110000000.0 | $110,000,000 | 274092705 | $274,092,705 | 449100000.0 | $449,100,000 | 693698673 | $693,698,673 | 583698673.0 | $583,698,673 | 69369867 | 69,369,867 | 91.0 | 7.7 | Warner Bros. | Sandra Bullock | Alfonso Cuarón | Alfonso Cuarón |
| 3 | Django Unchained | Dec 25, 2012 | Drama | R | 100000000.0 | $100,000,000 | 162805434 | $162,805,434 | 262600000.0 | $262,600,000 | 449948323 | $449,948,323 | 349948323.0 | $349,948,323 | 44994832 | 44,994,832 | 165.0 | 8.4 | The Weinstein Company | Jamie Foxx | Quentin Tarantino | Quentin Tarantino |
| 4 | Sing | Dec 21, 2016 | Drama | PG-13 | 75000000.0 | $75,000,000 | 270329045 | $270,329,045 | 363800000.0 | $363,800,000 | 634454789 | $634,454,789 | 559454789.0 | $559,454,789 | 63445479 | 63,445,479 | 98.0 | 7.1 | TriStar Pictures | Lorraine Bracco | Richard Baskin | Dean Pitchford |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 301 | A Dirty Shame | September 24, 2004 | Drama | NC-17 | 15000000.0 | $15,000,000 | 1339668 | $1,339,668 | 574498.0 | $574,498 | 1914166 | $1,914,166 | -13085834.0 | $-13,085,834 | 191417 | 191,417 | 84.0 | 5.1 | Killer Films | Suzanne Shepherd | John Waters | John Waters |
| 302 | Young Adam | April 16, 2004 | Drama | NC-17 | 6400000.0 | $6,400,000 | 767373 | $767,373 | 1794447.0 | $1,794,447 | 2561820 | $2,561,820 | -3838180.0 | $-3,838,180 | 256182 | 256,182 | 98.0 | 6.4 | Recorded Picture Company | Tilda Swinton | David Mackenzie | \tDavid Mackenzie |
| 303 | Whore 1991 | October 4, 1991 | Drama | NC-17 | 50000.0 | $50,000 | 0 | $0 | 0.0 | $0 | 1008404 | $1,008,404 | 958404.0 | $958,404 | 100840 | 100,840 | 80.0 | 5.5 | Cheap Date | Theresa Russell | Ken Russell | Deborah Dalton |
| 304 | Ma Mère | May 13, 2005 | Drama | NC-17 | 3259572.0 | $3,259,572 | 71616 | $71,616 | 950532.0 | $950,532 | 1022148 | $1,022,148 | -2237424.0 | $-2,237,424 | 102215 | 102,215 | 110.0 | 5.0 | Gemini Films | Louis Garrel | Christophe Honoré | Christophe Honoré |
| 305 | Law of Desire | April 3, 1987 | Drama | NC-17 | 612072.0 | $612,072 | 0 | $0 | 0.0 | $0 | 1470809 | $1,470,809 | 858737.0 | $858,737 | 147081 | 147,081 | 82.0 | 7.1 | El Deseo | Antonio Banderas | Pedro Almodóvar | Pedro Almodóvar |
306 rows × 22 columns
Creating a list of all the 'Movies Profits' from 'Drama_DataFrame' dataframe.
profit_all = []
for i,x in enumerate(Drama_DataFrame.Profit):
profit_all.append(x)
Creating a list of all the 'Movies Names' from 'Drama_DataFrame' dataframe.
name_all = []
for i,x in enumerate(Drama_DataFrame.Movie):
name_all.append(x)
Creating a list of all the 'Movies Revenue' from 'Drama_DataFrame' dataframe.
rev_all = []
for i,x in enumerate(Drama_DataFrame.Worldwide_Gross):
rev_all.append(x)
Creating a list of all the 'Movies Budget' from 'Drama_DataFrame' dataframe.
bud_all = []
for i,x in enumerate(Drama_DataFrame.Production_Budget):
bud_all.append(x)
Creating a list of all the 'Movies Return On Investment Percetage' from 'Drama_DataFrame' dataframe.
rio_per_all = []
for i,x in enumerate(Drama_DataFrame.Profit):
j = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
rio_per_all.append(int(round(j,0)))
Creating a list of all the 'Movies Net Profit Margin' from 'Drama_DataFrame' dataframe.
npm_all = []
for i,x in enumerate(Drama_DataFrame.Profit):
j = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100
npm_all.append(int(round(j,0)))
Creating tuples consisting of the 'Movies Names, Profits, Revenue, Budget, ROI, NPM' put together which will then be in a list
all_all = []
for i,x in enumerate(profit_all):all_all.append((name_all[i],x,rev_all[i],bud_all[i],rio_per_all[i],npm_all[i]))
After creating the 'all_to' list, the list will be sorted by the 'Profit' of each movie going in decending order. And prinitng out the 'Top 20 Highest Profitable Movies'
all_all.sort(key=lambda i:i[1],reverse=True)
print(all_all[:20])
[('The Lion King 1994', 941214868.0, 986214868, 45000000.0, 2092, 95), ('Gravity', 583698673.0, 693698673, 110000000.0, 531, 84), ('Sing', 559454789.0, 634454789, 75000000.0, 746, 88), ('Tex', 544368315.0, 549368315, 5000000.0, 10887, 99), ('Fifty Shades of Grey', 530998101.0, 570998101, 40000000.0, 1327, 93), ('Cinderella', 447351353.0, 542351353, 95000000.0, 471, 82), ('Beauty and the Beast 1991', 418656843.0, 438656843, 20000000.0, 2093, 95), ('Django Unchained', 349948323.0, 449948323, 100000000.0, 350, 78), ('Fifty Shades Darker', 326398492.0, 381398492, 55000000.0, 593, 86), ('Black Swan', 318266710.0, 331266710, 13000000.0, 2448, 96), ('A Quiet Place', 317522294.0, 334522294, 17000000.0, 1868, 95), ('Fifty Shades Freed', 316350619.0, 371350619, 55000000.0, 575, 85), ('Gone Girl', 307567189.0, 368567189, 61000000.0, 504, 83), ('The Secret Garden', 293281000.0, 311281000, 18000000.0, 1629, 94), ('Wonder', 285937718.0, 305937718, 20000000.0, 1430, 93), ('Wonder', 284604712.0, 304604712, 20000000.0, 1423, 93), ('The Sound of Music', 278014195.0, 286214195, 8200000.0, 3390, 97), ('Bambi 1942', 267142000.0, 268000000, 858000.0, 31135, 100), ('The Hunchback of Notre Drame', 255500000.0, 325500000, 70000000.0, 365, 78), ('True Grit', 217276928.0, 252276928, 35000000.0, 621, 86)]
Getting the 'Names' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.
top_name = []
for x,i in enumerate(all_all[:20]):
if x == 0 : top_name.append(i[0]+' | '+str(int(x+1))+'st Highest')
elif x == 1 : top_name.append(i[0]+' | '+str(int(x+1))+'nd Highest')
elif x == 2 : top_name.append(i[0]+' | '+str(int(x+1))+'rd Highest')
else: top_name.append(i[0]+' | '+str(int(x+1))+'th Highest')
print(top_name)
['The Lion King 1994 | 1st Highest', 'Gravity | 2nd Highest', 'Sing | 3rd Highest', 'Tex | 4th Highest', 'Fifty Shades of Grey | 5th Highest', 'Cinderella | 6th Highest', 'Beauty and the Beast 1991 | 7th Highest', 'Django Unchained | 8th Highest', 'Fifty Shades Darker | 9th Highest', 'Black Swan | 10th Highest', 'A Quiet Place | 11th Highest', 'Fifty Shades Freed | 12th Highest', 'Gone Girl | 13th Highest', 'The Secret Garden | 14th Highest', 'Wonder | 15th Highest', 'Wonder | 16th Highest', 'The Sound of Music | 17th Highest', 'Bambi 1942 | 18th Highest', 'The Hunchback of Notre Drame | 19th Highest', 'True Grit | 20th Highest']
Getting the 'Profit' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.
top_profit = []
for i in all_all[:20]:top_profit.append(i[1])
print(top_profit)
[941214868.0, 583698673.0, 559454789.0, 544368315.0, 530998101.0, 447351353.0, 418656843.0, 349948323.0, 326398492.0, 318266710.0, 317522294.0, 316350619.0, 307567189.0, 293281000.0, 285937718.0, 284604712.0, 278014195.0, 267142000.0, 255500000.0, 217276928.0]
Getting the 'Revenue' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.
top_rev = []
for i in all_all[:20]:top_rev.append(i[2])
print(top_rev)
[986214868, 693698673, 634454789, 549368315, 570998101, 542351353, 438656843, 449948323, 381398492, 331266710, 334522294, 371350619, 368567189, 311281000, 305937718, 304604712, 286214195, 268000000, 325500000, 252276928]
Getting the 'Budget' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.
top_bud = []
for i in all_all[:20]:top_bud.append(i[3])
print(top_bud)
[45000000.0, 110000000.0, 75000000.0, 5000000.0, 40000000.0, 95000000.0, 20000000.0, 100000000.0, 55000000.0, 13000000.0, 17000000.0, 55000000.0, 61000000.0, 18000000.0, 20000000.0, 20000000.0, 8200000.0, 858000.0, 70000000.0, 35000000.0]
Getting the 'ROI' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.
top_roi = []
for i in all_all[:20]:top_roi.append(i[4])
print(top_roi)
[2092, 531, 746, 10887, 1327, 471, 2093, 350, 593, 2448, 1868, 575, 504, 1629, 1430, 1423, 3390, 31135, 365, 621]
Getting the 'NPM' of the movies that are the 'Top 20 Highest Profitable Movies' in the 'Drama_DataFrame' dataframe.
top_npm = []
for i in all_all[:20]:top_npm.append(i[5])
print(top_npm)
[95, 84, 88, 99, 93, 82, 95, 78, 86, 96, 95, 85, 83, 94, 93, 93, 97, 100, 78, 86]
This is the HTML Script from Highcharts Libaray to visualize the Revenue, Profit, Cost, Return On Investment and Net Profit Margin in the 'Top 20 Highest Profitable Movies' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, to see which movie is the most 'Successful' using a 'Colunm Series and Line Chart infused'. This will be done using Javascript and HTML below.
%%html
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
<figure class="highcharts-figure">
<div id="-" style='width:1000' ></div>
<div id="-"style='width:1000' ></div>
</figure>
%%js
Highcharts.chart('x',{
chart: {
width: 900,
height: 350
},
title:{
text:"What Movie Is The Most Successful1?"
},
xAxis:{
categories:['The Lion King 1994 | 1st Highest', 'Gravity | 2nd Highest', 'Sing | 3rd Highest', 'Tex | 4th Highest',
'Fifty Shades of Grey | 5th Highest', 'Cinderella | 6th Highest', 'Beauty and the Beast 1991 | 7th Highest',
'Django Unchained | 8th Highest', 'Fifty Shades Darker | 9th Highest', 'Black Swan | 10th Highest'],
crosshair:{
enabled:true
},
labels:{
enabled:false
}
},
yAxis:{
min:0,
max:1000000000,
step:250000000,
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
plotOptions:{
series:{
marker:{
states:{
hover:{
radiusPlus:12,
lineWidthPlus:5
}
}
}
}
},
tooltip:{
shared:false
},
states:{
hover:{
lineWidthPlus:10
}
},
series:[{
type:'column',
color:'#C21602',
name:'Profit',
data:[941214868.0, 583698673.0, 559454789.0, 544368315.0, 530998101.0, 447351353.0,
418656843.0, 349948323.0, 326398492.0, 318266710.0]
},{
type:'column',
color:'#F88379',
name:'Revenue',
data:[986214868, 693698673, 634454789, 549368315, 570998101, 542351353,
438656843, 449948323, 381398492, 331266710]
},{
type:'spline',
color:'gold',
name:'Cost',
data:[45000000.0, 110000000.0, 75000000.0, 5000000.0, 40000000.0,
95000000.0, 20000000.0, 100000000.0, 55000000.0, 13000000.0],
marker:{
lineWidth: 2,
lineColor: 'gold',
fillColor: 'white',
raduis:2
}
}]
});
%%js
(function (H) {
H.addEvent(H.Axis, 'afterInit', function () {
const logarithmic = this.logarithmic;
if (logarithmic && this.options.custom.allowNegativeLog) {
// Avoid errors on negative numbers on a log axis
this.positiveValuesOnly = false;
// Override the converter functions
logarithmic.log2lin = num => {
const isNegative = num < 0;
let adjustedNum = Math.abs(num);
if (adjustedNum < 10) {
adjustedNum += (10 - adjustedNum) / 10;
}
const result = Math.log(adjustedNum) / Math.LN10;
return isNegative ? -result : result;
};
logarithmic.lin2log = num => {
const isNegative = num < 0;
let result = Math.pow(10, Math.abs(num));
if (result < 10) {
result = (10 * (result - 1)) / (10 - 1);
}
return isNegative ? -result : result;
};
}
});
}(Highcharts));
Highcharts.chart('-',{
chart: {
width: 900,
height: 300
},
title:{
text:""
},
xAxis:{
categories:['The Lion King 1994 | 1st Highest', 'Gravity | 2nd Highest', 'Sing | 3rd Highest', 'Tex | 4th Highest',
'Fifty Shades of Grey | 5th Highest', 'Cinderella | 6th Highest', 'Beauty and the Beast 1991 | 7th Highest',
'Django Unchained | 8th Highest', 'Fifty Shades Darker | 9th Highest', 'Black Swan | 10th Highest'],
crosshair:{
enabled:true
},
labels:{
enabled:true
}
},
yAxis: {
type: 'logarithmic',
custom: {
allowNegativeLog: true
},
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
plotOptions: {
bar: {
dataLabels: {
enabled: true,
valueSuffix:'%',
}
},
series: {
dataLabels: {
enabled: true,
valueSuffix:'%',
style: {
textOutline: false ,
fontWeight: 'bold'
}
}
}
},
tooltip:{
valueSuffix:'%',
shared:true
},
series:[{
type:'column',
color:'#F57070',
name:'Net Profit Margin',
data:[95, 84, 88, 99, 93, 82, 95, 78, 86, 96]
},{
type:'column',
color:'#EC0303',
name:'Return On Investment Percentage',
data:[2092, 531, 746, 10887, 1327, 471, 2093, 350, 593, 2448]
}]
});
This is the HTML Script from Highcharts Libaray to visualize the Revenue, Profit, Cost, Return On Investment and Net Profit Margin in the 'Top 20 Highest Profitable Movies' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, to see which movie is the most 'Successful' using a 'Colunm Series and Line Chart infused'. This will be done using Javascript and HTML below.
%%HTML
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.toolbar.min.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts.js"></script>
<script src="https://code.highcharts.com/4.2.2/highcharts-more.js"></script>
<script src="https://cdn.webdatarocks.com/latest/webdatarocks.highcharts.js"></script>
<figure class="highcharts-figure">
<div id="-" style='width:1000' ></div>
<div id="-" style='width:1000' ></div>
</figure>
%%js
Highcharts.chart('-',{
chart: {
width: 900,
height: 350
},
title:{
text:"What Movie Is The Most Successful?"
},
xAxis:{
categories:['A Quiet Place | 11th Highest', 'Fifty Shades Freed | 12th Highest', 'Gone Girl | 13th Highest', 'The Secret Garden | 14th Highest',
'Wonder | 15th Highest', 'Wonder | 16th Highest', 'The Sound of Music | 17th Highest', 'Bambi 1942 | 18th Highest',
'The Hunchback of Notre Drame | 19th Highest', 'True Grit | 20th Highest'],
crosshair:{
enabled:true
},
labels:{
enabled:false
}
},
yAxis:{
min:0,
max:400000000,
step:250000000,
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
plotOptions:{
series:{
marker:{
states:{
hover:{
radiusPlus:12,
lineWidthPlus:5
}
}
}
}
},
tooltip:{
shared:false
},
states:{
hover:{
lineWidthPlus:10
}
},
series:[{
type:'column',
color:'#C21602',
name:'Profit',
data:[317522294.0, 316350619.0, 307567189.0, 293281000.0, 285937718.0, 284604712.0, 278014195.0, 267142000.0, 255500000.0, 217276928.0]
},{
type:'column',
color:'#F88379',
name:'Revenue',
data:[334522294, 371350619, 368567189, 311281000, 305937718, 304604712, 286214195, 268000000, 325500000, 252276928]
},{
type:'spline',
color:'gold',
name:'Cost',
data:[17000000.0, 55000000.0, 61000000.0, 18000000.0, 20000000.0, 20000000.0, 8200000.0, 858000.0, 70000000.0, 35000000.0],
marker:{
lineWidth: 2,
lineColor: 'gold',
fillColor: 'white',
raduis:2
}
}]
});
%%js
(function (H) {
H.addEvent(H.Axis, 'afterInit', function () {
const logarithmic = this.logarithmic;
if (logarithmic && this.options.custom.allowNegativeLog) {
// Avoid errors on negative numbers on a log axis
this.positiveValuesOnly = false;
// Override the converter functions
logarithmic.log2lin = num => {
const isNegative = num < 0;
let adjustedNum = Math.abs(num);
if (adjustedNum < 10) {
adjustedNum += (10 - adjustedNum) / 10;
}
const result = Math.log(adjustedNum) / Math.LN10;
return isNegative ? -result : result;
};
logarithmic.lin2log = num => {
const isNegative = num < 0;
let result = Math.pow(10, Math.abs(num));
if (result < 10) {
result = (10 * (result - 1)) / (10 - 1);
}
return isNegative ? -result : result;
};
}
});
}(Highcharts));
Highcharts.chart('-',{
chart: {
width: 900,
height: 310
},
title:{
text:""
},
xAxis:{
categories:['A Quiet Place | 11th Highest', 'Fifty Shades Freed | 12th Highest', 'Gone Girl | 13th Highest', 'The Secret Garden | 14th Highest',
'Wonder | 15th Highest', 'Wonder | 16th Highest', 'The Sound of Music | 17th Highest', 'Bambi 1942 | 18th Highest',
'The Hunchback of Notre Drame | 19th Highest', 'True Grit | 20th Highest'],
crosshair:{
enabled:true
},
labels:{
enabled:true
}
},
yAxis:{
type: 'logarithmic',
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
plotOptions: {
bar: {
dataLabels: {
enabled: true,
}
},
series: {
dataLabels: {
enabled: true,
style: {
textOutline: false ,
fontWeight: 'bold'
}
}
}
},
tooltip:{
valueSuffix:'%',
shared:true
},
series:[{
type:'column',
color:'#F57070',
name:'Net Profit Margin',
data:[95, 85, 83, 94, 93, 93, 97, 100, 78, 86]
},{
type:'column',
color:'#EC0303',
name:'Return On Investment Percentage',
data:[1868, 575, 504, 1629, 1430, 1423, 3390, 31135, 365, 621]
}]
});
This analysis objective is to axknowledge what system rating that best suit this genre, this allows the ideology of what kind of audience is drawn to this genre, to specify the audience and make them the target focus. This is the blueprint for creating the last visualization of this project, Highcharts will be used to create this graph.
Blueprint:
The graph used for this visualization is a Highchart Donut chart, it is basically a hallow pie chart which is commonly referred to as a donut charts. This pie charts also ahs an inner chart resulting in a hierachical type of visualization.
The first approach th this chart is the HTML section whcih is very simple, it has the div id which is where the graph is named and the style and height should be chosen.
The second approach is the javascript code, it is divided into two sections the inner pie and the outer pie. The inner pie shows how much each sytem ratings occupies the most total tickets sold in percentage comparied to the total tickest sold in the whole entire parent dataframe 'all_drama_info1'. The outer pie shows the individule movies in the category the amount of tickets they sold.
Javascript Section:
-First Section: The first section is a list called data it has consist of the name of the system ratings and the percenatge of tickets sold compared to the total of tickets sold. The color and the sliced adjustments and it also has the adjustments of the size of the ring pie. data:[{ name:'System Rating: R', y:28.0, sliced:true, color: '#4682B4',}
-Second Section: The second section is a list also called data it requires the name of all the movies, the amount of tickest they sold , the option to slice that section and the color of the slice .
data:[{
name: ' Avatar ' ,
y: 213565021 ,
sliced:true,
selected: true,
color:"#4682B4",}
This is the 'Drama_DataFrame' dataframe.
Drama_DataFrame
| Movie | Release_Date | Genre | Rating | Production_Budget | Production_Budget_x | Domestic_Gross | Domestic_Gross_x | Foreign_Gross | Foreign_Gross_x | Worldwide_Gross | Worldwide_Gross_x | Profit | Profit_x | Tickets | Tickets_x | Runtime | Averagerating | Company | Star | Director | Writer | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Hugo | Nov 23, 2011 | Drama | PG | 180000000.0 | $180,000,000 | 73864507 | $73,864,507 | 111900000.0 | $111,900,000 | 180047784 | $180,047,784 | 47784.0 | $47,784 | 18004778 | 18,004,778 | 126.0 | 7.5 | Paramount Pictures | Asa Butterfield | Martin Scorsese | John Logan |
| 1 | The Wolfman | Feb 12, 2010 | Drama | R | 150000000.0 | $150,000,000 | 62189884 | $62,189,884 | 77800000.0 | $77,800,000 | 142634358 | $142,634,358 | -7365642.0 | $-7,365,642 | 14263436 | 14,263,436 | NaN | 5.8 | NaN | Benicio Del Toro | Joe Johnston | Andrew Kevin Walker |
| 2 | Gravity | Oct 4, 2013 | Drama | PG-13 | 110000000.0 | $110,000,000 | 274092705 | $274,092,705 | 449100000.0 | $449,100,000 | 693698673 | $693,698,673 | 583698673.0 | $583,698,673 | 69369867 | 69,369,867 | 91.0 | 7.7 | Warner Bros. | Sandra Bullock | Alfonso Cuarón | Alfonso Cuarón |
| 3 | Django Unchained | Dec 25, 2012 | Drama | R | 100000000.0 | $100,000,000 | 162805434 | $162,805,434 | 262600000.0 | $262,600,000 | 449948323 | $449,948,323 | 349948323.0 | $349,948,323 | 44994832 | 44,994,832 | 165.0 | 8.4 | The Weinstein Company | Jamie Foxx | Quentin Tarantino | Quentin Tarantino |
| 4 | Sing | Dec 21, 2016 | Drama | PG-13 | 75000000.0 | $75,000,000 | 270329045 | $270,329,045 | 363800000.0 | $363,800,000 | 634454789 | $634,454,789 | 559454789.0 | $559,454,789 | 63445479 | 63,445,479 | 98.0 | 7.1 | TriStar Pictures | Lorraine Bracco | Richard Baskin | Dean Pitchford |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 301 | A Dirty Shame | September 24, 2004 | Drama | NC-17 | 15000000.0 | $15,000,000 | 1339668 | $1,339,668 | 574498.0 | $574,498 | 1914166 | $1,914,166 | -13085834.0 | $-13,085,834 | 191417 | 191,417 | 84.0 | 5.1 | Killer Films | Suzanne Shepherd | John Waters | John Waters |
| 302 | Young Adam | April 16, 2004 | Drama | NC-17 | 6400000.0 | $6,400,000 | 767373 | $767,373 | 1794447.0 | $1,794,447 | 2561820 | $2,561,820 | -3838180.0 | $-3,838,180 | 256182 | 256,182 | 98.0 | 6.4 | Recorded Picture Company | Tilda Swinton | David Mackenzie | \tDavid Mackenzie |
| 303 | Whore 1991 | October 4, 1991 | Drama | NC-17 | 50000.0 | $50,000 | 0 | $0 | 0.0 | $0 | 1008404 | $1,008,404 | 958404.0 | $958,404 | 100840 | 100,840 | 80.0 | 5.5 | Cheap Date | Theresa Russell | Ken Russell | Deborah Dalton |
| 304 | Ma Mère | May 13, 2005 | Drama | NC-17 | 3259572.0 | $3,259,572 | 71616 | $71,616 | 950532.0 | $950,532 | 1022148 | $1,022,148 | -2237424.0 | $-2,237,424 | 102215 | 102,215 | 110.0 | 5.0 | Gemini Films | Louis Garrel | Christophe Honoré | Christophe Honoré |
| 305 | Law of Desire | April 3, 1987 | Drama | NC-17 | 612072.0 | $612,072 | 0 | $0 | 0.0 | $0 | 1470809 | $1,470,809 | 858737.0 | $858,737 | 147081 | 147,081 | 82.0 | 7.1 | El Deseo | Antonio Banderas | Pedro Almodóvar | Pedro Almodóvar |
306 rows × 22 columns
Getting the number of Tickets sold in the R-rated category and the Names of the Movies from the 'Drama_DataFrame' dataframe.
var = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'R':
var.append((Drama_DataFrame.Movie[i],Drama_DataFrame.Tickets[i]))
print(var)
[('The Wolfman', 14263436), ('Django Unchained', 44994832), ('Downsizing', 5446297), ('Gone Girl', 36856719), ('Priest', 8415403), ('Fifty Shades Darker', 38139849), ('Fifty Shades Freed', 37135062), ('Crimson Peak', 7496685), ('Zero Dark Thirty', 13461244), ('Fifty Shades of Grey', 57099810), ('The Master', 5064742), ('Biutiful', 2468752), ('Flight', 16055844), ('Tulip Fever', 679277), ('The Ides of March', 7773592), ('Nocturnal Animals', 3239868), ('The Water Diviner', 3105473), ('Stone', 406502), ('For Colored Girls', 3801787), ('The Debt', 4660405), ('Let Me In', 2827040), ('By the Sea', 372775), ('Miss Sloane', 771963), ('The Homesman', 821757), ('The Immigrant', 758501), ('Never Let Me Go', 1117372), ('The Reluctant Fundamentalist', 52873), ('Black Swan', 33126671), ('Ex Machina', 3835839), ('Room', 3626278), ('Chloe', 1183113), ('If Beale Street Could Talk', 1985917), ('Arbitrage', 3583071), ('Stoker', 1203491), ('Carol', 4284352), ('Quartet', 5617894), ('Hereditary', 7013390), ('Coriolanus', 217962), ('Melancholia', 2181730), ('Manchester by the Sea', 7773387), ('We Need to Talk About Kevin', 1076528), ('Hesher', 38295), ('Addicted', 1749924), ('Everything Must Go', 282101), ('Mommy', 1753600), ('Take Shelter', 497202), ('Boyhood', 5727305), ('Stake Land', 67948), ('The Witch', 4045452), ('Margin Call', 2043323), ('Whiplash', 3896904), ('Before Midnight', 2325193), ('Silent House', 1661076), ("Winter's Bone", 1613155), ('The Florida Project', 1129532), ('We Are Your Friends', 1015342), ('Locke', 208839), ('Knock Knock', 632852), ('Buried', 2127029), ('Unsane', 1424493), ('Blue Valentine', 1656624), ('Martha Marcy May Marlene', 543891), ('Palo Alto', 115631), ('I Origins', 85240), ('The Canyons', 6238), ('Sound of My Voice', 42945), ('A Ghost Story', 276978), ('Ordinary People', 5476692), ('Fame', 7721184), ('Endless Love', 3471817), ('Ghost Story', 195168), ('One from the Heart', 63680), ('The Hand', 244758), ('Pennies from Heaven', 917129), ('Zoot Suit', 325608), ('Rich and Famous', 1300000), ('Raggedy Man', 1100000)]
After creating the 'var' list, the list will be sorted by the 'Tickets' sold by each movie going in decending order.
var.sort(key=lambda i:i[1],reverse=True)
print(var)
[('Fifty Shades of Grey', 57099810), ('Django Unchained', 44994832), ('Fifty Shades Darker', 38139849), ('Fifty Shades Freed', 37135062), ('Gone Girl', 36856719), ('Black Swan', 33126671), ('Flight', 16055844), ('The Wolfman', 14263436), ('Zero Dark Thirty', 13461244), ('Priest', 8415403), ('The Ides of March', 7773592), ('Manchester by the Sea', 7773387), ('Fame', 7721184), ('Crimson Peak', 7496685), ('Hereditary', 7013390), ('Boyhood', 5727305), ('Quartet', 5617894), ('Ordinary People', 5476692), ('Downsizing', 5446297), ('The Master', 5064742), ('The Debt', 4660405), ('Carol', 4284352), ('The Witch', 4045452), ('Whiplash', 3896904), ('Ex Machina', 3835839), ('For Colored Girls', 3801787), ('Room', 3626278), ('Arbitrage', 3583071), ('Endless Love', 3471817), ('Nocturnal Animals', 3239868), ('The Water Diviner', 3105473), ('Let Me In', 2827040), ('Biutiful', 2468752), ('Before Midnight', 2325193), ('Melancholia', 2181730), ('Buried', 2127029), ('Margin Call', 2043323), ('If Beale Street Could Talk', 1985917), ('Mommy', 1753600), ('Addicted', 1749924), ('Silent House', 1661076), ('Blue Valentine', 1656624), ("Winter's Bone", 1613155), ('Unsane', 1424493), ('Rich and Famous', 1300000), ('Stoker', 1203491), ('Chloe', 1183113), ('The Florida Project', 1129532), ('Never Let Me Go', 1117372), ('Raggedy Man', 1100000), ('We Need to Talk About Kevin', 1076528), ('We Are Your Friends', 1015342), ('Pennies from Heaven', 917129), ('The Homesman', 821757), ('Miss Sloane', 771963), ('The Immigrant', 758501), ('Tulip Fever', 679277), ('Knock Knock', 632852), ('Martha Marcy May Marlene', 543891), ('Take Shelter', 497202), ('Stone', 406502), ('By the Sea', 372775), ('Zoot Suit', 325608), ('Everything Must Go', 282101), ('A Ghost Story', 276978), ('The Hand', 244758), ('Coriolanus', 217962), ('Locke', 208839), ('Ghost Story', 195168), ('Palo Alto', 115631), ('I Origins', 85240), ('Stake Land', 67948), ('One from the Heart', 63680), ('The Reluctant Fundamentalist', 52873), ('Sound of My Voice', 42945), ('Hesher', 38295), ('The Canyons', 6238)]
all_to = []
for i in var:all_to.append(i[1])
print(sum(all_to))
449780631
Using a for loop to put the Name and the Number of Tickets sold in the R-rated Genre in html code which will be pasted in the cell below to create the Javascript graph that is interactive.
for i,x in enumerate(range(len(var))):
print(' },{ \n name:',"'",var[i][0],"'",','+'\n y:',var[i][1]/449780631,','+'\n color:"#581845",')
},{
name: ' Fifty Shades of Grey ' ,
y: 0.126950353271215 ,
color:"#581845",
},{
name: ' Django Unchained ' ,
y: 0.1000372823968936 ,
color:"#581845",
},{
name: ' Fifty Shades Darker ' ,
y: 0.08479655719100986 ,
color:"#581845",
},{
name: ' Fifty Shades Freed ' ,
y: 0.08256260817064842 ,
color:"#581845",
},{
name: ' Gone Girl ' ,
y: 0.08194376649358207 ,
color:"#581845",
},{
name: ' Black Swan ' ,
y: 0.07365072819242854 ,
color:"#581845",
},{
name: ' Flight ' ,
y: 0.035697055171768834 ,
color:"#581845",
},{
name: ' The Wolfman ' ,
y: 0.03171198361362964 ,
color:"#581845",
},{
name: ' Zero Dark Thirty ' ,
y: 0.02992846528333498 ,
color:"#581845",
},{
name: ' Priest ' ,
y: 0.018710016439102733 ,
color:"#581845",
},{
name: ' The Ides of March ' ,
y: 0.017283074157099485 ,
color:"#581845",
},{
name: ' Manchester by the Sea ' ,
y: 0.01728261837935836 ,
color:"#581845",
},{
name: ' Fame ' ,
y: 0.0171665551334068 ,
color:"#581845",
},{
name: ' Crimson Peak ' ,
y: 0.016667425147527084 ,
color:"#581845",
},{
name: ' Hereditary ' ,
y: 0.015592912448024023 ,
color:"#581845",
},{
name: ' Boyhood ' ,
y: 0.01273355188120584 ,
color:"#581845",
},{
name: ' Quartet ' ,
y: 0.012490297742501055 ,
color:"#581845",
},{
name: ' Ordinary People ' ,
y: 0.012176362481024666 ,
color:"#581845",
},{
name: ' Downsizing ' ,
y: 0.012108785093504838 ,
color:"#581845",
},{
name: ' The Master ' ,
y: 0.011260471551964184 ,
color:"#581845",
},{
name: ' The Debt ' ,
y: 0.010361506651894932 ,
color:"#581845",
},{
name: ' Carol ' ,
y: 0.009525425740264925 ,
color:"#581845",
},{
name: ' The Witch ' ,
y: 0.008994277923897528 ,
color:"#581845",
},{
name: ' Whiplash ' ,
y: 0.0086640102561464 ,
color:"#581845",
},{
name: ' Ex Machina ' ,
y: 0.008528244071941818 ,
color:"#581845",
},{
name: ' For Colored Girls ' ,
y: 0.008452536054181487 ,
color:"#581845",
},{
name: ' Room ' ,
y: 0.008062325831900929 ,
color:"#581845",
},{
name: ' Arbitrage ' ,
y: 0.00796626344721367 ,
color:"#581845",
},{
name: ' Endless Love ' ,
y: 0.007718911755450848 ,
color:"#581845",
},{
name: ' Nocturnal Animals ' ,
y: 0.007203218139466748 ,
color:"#581845",
},{
name: ' The Water Diviner ' ,
y: 0.006904416922301841 ,
color:"#581845",
},{
name: ' Let Me In ' ,
y: 0.006285375147690608 ,
color:"#581845",
},{
name: ' Biutiful ' ,
y: 0.005488791268114878 ,
color:"#581845",
},{
name: ' Before Midnight ' ,
y: 0.005169615674268552 ,
color:"#581845",
},{
name: ' Melancholia ' ,
y: 0.004850653517803438 ,
color:"#581845",
},{
name: ' Buried ' ,
y: 0.00472903645332829 ,
color:"#581845",
},{
name: ' Margin Call ' ,
y: 0.004542932396748761 ,
color:"#581845",
},{
name: ' If Beale Street Could Talk ' ,
y: 0.004415301289396786 ,
color:"#581845",
},{
name: ' Mommy ' ,
y: 0.0038987894967847116 ,
color:"#581845",
},{
name: ' Addicted ' ,
y: 0.00389061662372918 ,
color:"#581845",
},{
name: ' Silent House ' ,
y: 0.0036930803274185455 ,
color:"#581845",
},{
name: ' Blue Valentine ' ,
y: 0.0036831821688648927 ,
color:"#581845",
},{
name: ' Winter's Bone ' ,
y: 0.0035865372779914128 ,
color:"#581845",
},{
name: ' Unsane ' ,
y: 0.0031670839111789142 ,
color:"#581845",
},{
name: ' Rich and Famous ' ,
y: 0.0028902978705634837 ,
color:"#581845",
},{
name: ' Stoker ' ,
y: 0.0026757288265710135 ,
color:"#581845",
},{
name: ' Chloe ' ,
y: 0.0026304222957969038 ,
color:"#581845",
},{
name: ' The Florida Project ' ,
y: 0.0025112953341025483 ,
color:"#581845",
},{
name: ' Never Let Me Go ' ,
y: 0.0024842599324825083 ,
color:"#581845",
},{
name: ' Raggedy Man ' ,
y: 0.002445636659707563 ,
color:"#581845",
},{
name: ' We Need to Talk About Kevin ' ,
y: 0.0023934512200015122 ,
color:"#581845",
},{
name: ' We Are Your Friends ' ,
y: 0.0022574160157643607 ,
color:"#581845",
},{
name: ' Pennies from Heaven ' ,
y: 0.002039058458255398 ,
color:"#581845",
},{
name: ' The Homesman ' ,
y: 0.0018270173132466435 ,
color:"#581845",
},{
name: ' Miss Sloane ' ,
y: 0.0017163100115798451 ,
color:"#581845",
},{
name: ' The Immigrant ' ,
y: 0.001686379865477133 ,
color:"#581845",
},{
name: ' Tulip Fever ' ,
y: 0.0015102406666328858 ,
color:"#581845",
},{
name: ' Knock Knock ' ,
y: 0.0014070236830629552 ,
color:"#581845",
},{
name: ' Martha Marcy May Marlene ' ,
y: 0.0012092361531681874 ,
color:"#581845",
},{
name: ' Take Shelter ' ,
y: 0.0011054322167999271 ,
color:"#581845",
},{
name: ' Stone ' ,
y: 0.000903778357676767 ,
color:"#581845",
},{
name: ' By the Sea ' ,
y: 0.0008287929143840789 ,
color:"#581845",
},{
name: ' Zoot Suit ' ,
y: 0.000723926237721873 ,
color:"#581845",
},{
name: ' Everything Must Go ' ,
y: 0.0006271968612183303 ,
color:"#581845",
},{
name: ' A Ghost Story ' ,
y: 0.0006158068643022558 ,
color:"#581845",
},{
name: ' The Hand ' ,
y: 0.000544171943233367 ,
color:"#581845",
},{
name: ' Coriolanus ' ,
y: 0.0004845962342028908 ,
color:"#581845",
},{
name: ' Locke ' ,
y: 0.000464313013069698 ,
color:"#581845",
},{
name: ' Ghost Story ' ,
y: 0.0004339181960016415 ,
color:"#581845",
},{
name: ' Palo Alto ' ,
y: 0.00025708310236240476 ,
color:"#581845",
},{
name: ' I Origins ' ,
y: 0.00018951460806679334 ,
color:"#581845",
},{
name: ' Stake Land ' ,
y: 0.00015106919977619045 ,
color:"#581845",
},{
name: ' One from the Heart ' ,
y: 0.0001415801295365251 ,
color:"#581845",
},{
name: ' The Reluctant Fundamentalist ' ,
y: 0.00011755286100792544 ,
color:"#581845",
},{
name: ' Sound of My Voice ' ,
y: 9.547987850103754e-05 ,
color:"#581845",
},{
name: ' Hesher ' ,
y: 8.514150534863738e-05 ,
color:"#581845",
},{
name: ' The Canyons ' ,
y: 1.3868983166596162e-05 ,
color:"#581845",
Getting the number of Tickets sold in the NC-17 rated category and the Names of the Movies from the 'Drama_DataFrame' dataframe.
var1 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'NC-17':
var1.append((Drama_DataFrame.Movie[i],Drama_DataFrame.Tickets[i]))
print(var1)
[('Shame', 2041284), ('Matador', 1735627), ('Whore', 100840), ('Tokyo Decadence', 27784), ('Wide Sargasso Sea', 161478), ('Kids', 2041222), ('Showgirls', 2035075), ('Crash', 9841006), ('Bent', 49606), ('The Dreamers', 1512116), ('Ma mère', 102215), ('Lust, Caution', 6709192), ('Shame', 2041284), ('Blue Is the Warmest Colour', 1946584), ('Showgirls', 3775075), ('The Dreamers', 1530711), ('Shame', 2041284), ('Blue Is the Warmest Colour', 1946584), ('Blue Valentine', 1656624), ('Two Girls and a Guy', 231503), ('Elles', 382224), ('Hell', 21312000), ('Killer Joe', 465911), ('Se, jie', 6516743), ('Queen of Hearts', 123684), ('The Evil Dead', 266194), ('Man Bites Dog', 20557), ('Shame', 2041284), ('Nymphomaniac: Vol. I', 209430), ('Arabian Nights', 345342), ('Frontier(s)', 278354), ('Chained', 10309), ('Natural Born Killers', 5028356), ('Clerks', 389424), ('Bad Lieutenant', 203892), ('The Big Feast', 69087), ('Beyond the Valley of the Dolls', 900000), ('Kids', 2041222), ('Crash', 10117304), ('Last Tango in Paris', 3614771), ('Pink Flamingos', 41380), ('Lust, Caution ', 6516743), ('Happiness 1998', 574645), ('Orgazmo', 62729), ('A Dirty Shame', 191417), ('Young Adam', 256182), ('Whore 1991', 100840), ('Ma Mère', 102215), ('Law of Desire', 147081)]
After creating the 'var1' list, the list will be sorted by the 'Tickets' sold by each movie going in decending order.
var1.sort(key=lambda i:i[1],reverse=True)
print(var1)
[('Hell', 21312000), ('Crash', 10117304), ('Crash', 9841006), ('Lust, Caution', 6709192), ('Se, jie', 6516743), ('Lust, Caution ', 6516743), ('Natural Born Killers', 5028356), ('Showgirls', 3775075), ('Last Tango in Paris', 3614771), ('Shame', 2041284), ('Shame', 2041284), ('Shame', 2041284), ('Shame', 2041284), ('Kids', 2041222), ('Kids', 2041222), ('Showgirls', 2035075), ('Blue Is the Warmest Colour', 1946584), ('Blue Is the Warmest Colour', 1946584), ('Matador', 1735627), ('Blue Valentine', 1656624), ('The Dreamers', 1530711), ('The Dreamers', 1512116), ('Beyond the Valley of the Dolls', 900000), ('Happiness 1998', 574645), ('Killer Joe', 465911), ('Clerks', 389424), ('Elles', 382224), ('Arabian Nights', 345342), ('Frontier(s)', 278354), ('The Evil Dead', 266194), ('Young Adam', 256182), ('Two Girls and a Guy', 231503), ('Nymphomaniac: Vol. I', 209430), ('Bad Lieutenant', 203892), ('A Dirty Shame', 191417), ('Wide Sargasso Sea', 161478), ('Law of Desire', 147081), ('Queen of Hearts', 123684), ('Ma mère', 102215), ('Ma Mère', 102215), ('Whore', 100840), ('Whore 1991', 100840), ('The Big Feast', 69087), ('Orgazmo', 62729), ('Bent', 49606), ('Pink Flamingos', 41380), ('Tokyo Decadence', 27784), ('Man Bites Dog', 20557), ('Chained', 10309)]
all_to = []
for i in var1:all_to.append(i[1])
print(sum(all_to))
103856414
Using a for loop to put the Name and the Number of Tickets sold in the NC-17 rated Genre in html code which will be pasted in the cell below to create the Javascript graph that is interactive.
for i,x in enumerate(range(len(var1))):
print(' },{ \n name:',"'",var1[i][0],"'",','+'\n y:',var1[i][1]/103856414,',','\n color:"#FF5733",')
},{
name: ' Hell ' ,
y: 0.20520639197113044 ,
color:"#FF5733",
},{
name: ' Crash ' ,
y: 0.09741626549901868 ,
color:"#FF5733",
},{
name: ' Crash ' ,
y: 0.09475588094154686 ,
color:"#FF5733",
},{
name: ' Lust, Caution ' ,
y: 0.06460065143400773 ,
color:"#FF5733",
},{
name: ' Se, jie ' ,
y: 0.06274762192347601 ,
color:"#FF5733",
},{
name: ' Lust, Caution ' ,
y: 0.06274762192347601 ,
color:"#FF5733",
},{
name: ' Natural Born Killers ' ,
y: 0.0484164223116735 ,
color:"#FF5733",
},{
name: ' Showgirls ' ,
y: 0.036348982740728945 ,
color:"#FF5733",
},{
name: ' Last Tango in Paris ' ,
y: 0.034805467094213366 ,
color:"#FF5733",
},{
name: ' Shame ' ,
y: 0.019654866958915027 ,
color:"#FF5733",
},{
name: ' Shame ' ,
y: 0.019654866958915027 ,
color:"#FF5733",
},{
name: ' Shame ' ,
y: 0.019654866958915027 ,
color:"#FF5733",
},{
name: ' Shame ' ,
y: 0.019654866958915027 ,
color:"#FF5733",
},{
name: ' Kids ' ,
y: 0.019654269980860305 ,
color:"#FF5733",
},{
name: ' Kids ' ,
y: 0.019654269980860305 ,
color:"#FF5733",
},{
name: ' Showgirls ' ,
y: 0.01959508249533823 ,
color:"#FF5733",
},{
name: ' Blue Is the Warmest Colour ' ,
y: 0.018743031123720486 ,
color:"#FF5733",
},{
name: ' Blue Is the Warmest Colour ' ,
y: 0.018743031123720486 ,
color:"#FF5733",
},{
name: ' Matador ' ,
y: 0.016711794035176298 ,
color:"#FF5733",
},{
name: ' Blue Valentine ' ,
y: 0.015951099563287444 ,
color:"#FF5733",
},{
name: ' The Dreamers ' ,
y: 0.01473872379225418 ,
color:"#FF5733",
},{
name: ' The Dreamers ' ,
y: 0.014559678519229444 ,
color:"#FF5733",
},{
name: ' Beyond the Valley of the Dolls ' ,
y: 0.00866581047175382 ,
color:"#FF5733",
},{
name: ' Happiness 1998 ' ,
y: 0.005533071842823304 ,
color:"#FF5733",
},{
name: ' Killer Joe ' ,
y: 0.0044861071363392156 ,
color:"#FF5733",
},{
name: ' Clerks ' ,
y: 0.003749638419058066 ,
color:"#FF5733",
},{
name: ' Elles ' ,
y: 0.0036803119352840355 ,
color:"#FF5733",
},{
name: ' Arabian Nights ' ,
y: 0.003325187022151564 ,
color:"#FF5733",
},{
name: ' Frontier(s) ' ,
y: 0.0026801811200606253 ,
color:"#FF5733",
},{
name: ' The Evil Dead ' ,
y: 0.0025630963919089293 ,
color:"#FF5733",
},{
name: ' Young Adam ' ,
y: 0.002466694064749819 ,
color:"#FF5733",
},{
name: ' Two Girls and a Guy ' ,
y: 0.002229067912936027 ,
color:"#FF5733",
},{
name: ' Nymphomaniac: Vol. I ' ,
y: 0.002016534096777114 ,
color:"#FF5733",
},{
name: ' Bad Lieutenant ' ,
y: 0.001963210476340922 ,
color:"#FF5733",
},{
name: ' A Dirty Shame ' ,
y: 0.0018430927145241121 ,
color:"#FF5733",
},{
name: ' Wide Sargasso Sea ' ,
y: 0.0015548197148420703 ,
color:"#FF5733",
},{
name: ' Law of Desire ' ,
y: 0.001416195633328915 ,
color:"#FF5733",
},{
name: ' Queen of Hearts ' ,
y: 0.0011909134470982216 ,
color:"#FF5733",
},{
name: ' Ma mère ' ,
y: 0.0009841953526336853 ,
color:"#FF5733",
},{
name: ' Ma Mère ' ,
y: 0.0009841953526336853 ,
color:"#FF5733",
},{
name: ' Whore ' ,
y: 0.0009709559199685058 ,
color:"#FF5733",
},{
name: ' Whore 1991 ' ,
y: 0.0009709559199685058 ,
color:"#FF5733",
},{
name: ' The Big Feast ' ,
y: 0.0006652164978467291 ,
color:"#FF5733",
},{
name: ' Orgazmo ' ,
y: 0.0006039973612029393 ,
color:"#FF5733",
},{
name: ' Bent ' ,
y: 0.00047764021584646666 ,
color:"#FF5733",
},{
name: ' Pink Flamingos ' ,
y: 0.00039843470813463673 ,
color:"#FF5733",
},{
name: ' Tokyo Decadence ' ,
y: 0.0002675231979413424 ,
color:"#FF5733",
},{
name: ' Man Bites Dog ' ,
y: 0.0001979367398531592 ,
color:"#FF5733",
},{
name: ' Chained ' ,
y: 9.926204461478904e-05 ,
color:"#FF5733",
Getting the number of Tickets sold in the PG-rated category and the Names of the Movies from the 'Drama_DataFrame' dataframe.
var2 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'PG':
var2.append((Drama_DataFrame.Movie[i],Drama_DataFrame.Tickets[i]))
print(var2)
[('Hugo', 18004778), ('Dolphin Tale', 9606872), ('Extraordinary Measures', 1582698), ('Wonder', 30460471), ('The Last Song', 9267895), ('War Room', 7397524), ('The Lunchbox', 1223150), ('Somewhere in Time', 970960), ('Urban Cowboy', 4691829), ('Cinderella', 54235135), ('War Room', 7398690), ('Wonder', 30593772), ('Little Women', 21660121), ('Overcomer', 3810299), ('The Jazz Singer', 2711800), ('Cattle Annie and Little Britches', 53482), ('The Majestic', 3730633), ('A Walk to Remember', 4749492), ('Tuck Everlasting', 1934462), ('Dreamer', 3874173), ('The Lake House', 11483011), ('We Are Marshall', 4354536), ('Akeelah and the Bee', 1894842), ('The Ultimate Gift', 343874), ('Bridge to Terabithia', 13758706), ('August Rush', 6460576), ('Fireproof', 3347330), ('The Last Song', 8913705), ('What If...', 852629), ("God's Not Dead", 6466787), ("Mr. Holland's Opus", 10626997), ('The Indian in the Cupboard', 3565613), ('Fluke', 398777), ('Three Wishes', 702550), ('Phenomenon', 15203638), ('Contact', 17112033), ('The Spanish Prisoner', 1383513), ('Music of the Heart', 1485939), ('Sense and Sensibility', 13458278), ('The Secret of Roan Inish', 610182), ('The Remains of the Day', 6395497), ('Gettysburg', 1076996), ('The Age of Innocence', 3225544), ('Pure Country', 1516446), ('Forever Young', 12795619), ('Newsies', 281948), ('A River Runs Through It', 4344029), ('Honeysuckle Rose', 1781521), ('Resurrection', 15729752), ('Taps', 3585605), ('On Golden Pond', 11928543), ('Absence of Malice', 4071696), ('Ragtime', 1492078), ('Looker', 328123), ('The Night the Lights Went Out in Georgia', 1492375), ('Rocky III', 12505269), ('Tex', 54936832), ('Six Weeks', 666802), ('Five Days One Summer', 19908), ('Staying Alive', 6489267), ('Eddie and the Cruisers', 478679), ('Tender Mercies', 844312), ('Testament', 204489), ('Table for Five', 240000), ('Man, Woman and Child', 170591), ('Footloose', 8000894), ('The Natural', 4800000)]
After creating the 'var2' list, the list will be sorted by the 'Tickets' sold by each movie going in decending order.
var2.sort(key=lambda i:i[1],reverse=True)
print(var2)
[('Tex', 54936832), ('Cinderella', 54235135), ('Wonder', 30593772), ('Wonder', 30460471), ('Little Women', 21660121), ('Hugo', 18004778), ('Contact', 17112033), ('Resurrection', 15729752), ('Phenomenon', 15203638), ('Bridge to Terabithia', 13758706), ('Sense and Sensibility', 13458278), ('Forever Young', 12795619), ('Rocky III', 12505269), ('On Golden Pond', 11928543), ('The Lake House', 11483011), ("Mr. Holland's Opus", 10626997), ('Dolphin Tale', 9606872), ('The Last Song', 9267895), ('The Last Song', 8913705), ('Footloose', 8000894), ('War Room', 7398690), ('War Room', 7397524), ('Staying Alive', 6489267), ("God's Not Dead", 6466787), ('August Rush', 6460576), ('The Remains of the Day', 6395497), ('The Natural', 4800000), ('A Walk to Remember', 4749492), ('Urban Cowboy', 4691829), ('We Are Marshall', 4354536), ('A River Runs Through It', 4344029), ('Absence of Malice', 4071696), ('Dreamer', 3874173), ('Overcomer', 3810299), ('The Majestic', 3730633), ('Taps', 3585605), ('The Indian in the Cupboard', 3565613), ('Fireproof', 3347330), ('The Age of Innocence', 3225544), ('The Jazz Singer', 2711800), ('Tuck Everlasting', 1934462), ('Akeelah and the Bee', 1894842), ('Honeysuckle Rose', 1781521), ('Extraordinary Measures', 1582698), ('Pure Country', 1516446), ('The Night the Lights Went Out in Georgia', 1492375), ('Ragtime', 1492078), ('Music of the Heart', 1485939), ('The Spanish Prisoner', 1383513), ('The Lunchbox', 1223150), ('Gettysburg', 1076996), ('Somewhere in Time', 970960), ('What If...', 852629), ('Tender Mercies', 844312), ('Three Wishes', 702550), ('Six Weeks', 666802), ('The Secret of Roan Inish', 610182), ('Eddie and the Cruisers', 478679), ('Fluke', 398777), ('The Ultimate Gift', 343874), ('Looker', 328123), ('Newsies', 281948), ('Table for Five', 240000), ('Testament', 204489), ('Man, Woman and Child', 170591), ('Cattle Annie and Little Britches', 53482), ('Five Days One Summer', 19908)]
all_to = []
for i in var2:all_to.append(i[1])
print(sum(all_to))
499784567
Using a for loop to put the Name and the Number of Tickets sold in the PG rated Genre in html code which will be pasted in the cell below to create the Javascript graph that is interactive.
for i,x in enumerate(range(len(var2))):
print(' },{ \n name:',"'",var2[i][0],"'",','+'\n y:',var2[i][1]/499784567,',','\n color:"#C70039",')
},{
name: ' Tex ' ,
y: 0.10992102523245781 ,
color:"#C70039",
},{
name: ' Cinderella ' ,
y: 0.10851702629705251 ,
color:"#C70039",
},{
name: ' Wonder ' ,
y: 0.06121391899642231 ,
color:"#C70039",
},{
name: ' Wonder ' ,
y: 0.06094720207717018 ,
color:"#C70039",
},{
name: ' Little Women ' ,
y: 0.043338915265064594 ,
color:"#C70039",
},{
name: ' Hugo ' ,
y: 0.036025077981249466 ,
color:"#C70039",
},{
name: ' Contact ' ,
y: 0.03423881834270405 ,
color:"#C70039",
},{
name: ' Resurrection ' ,
y: 0.031473064673483604 ,
color:"#C70039",
},{
name: ' Phenomenon ' ,
y: 0.03042038310878855 ,
color:"#C70039",
},{
name: ' Bridge to Terabithia ' ,
y: 0.027529273427924796 ,
color:"#C70039",
},{
name: ' Sense and Sensibility ' ,
y: 0.0269281584279092 ,
color:"#C70039",
},{
name: ' Forever Young ' ,
y: 0.02560226914729842 ,
color:"#C70039",
},{
name: ' Rocky III ' ,
y: 0.025021318835561402 ,
color:"#C70039",
},{
name: ' On Golden Pond ' ,
y: 0.023867369638086482 ,
color:"#C70039",
},{
name: ' The Lake House ' ,
y: 0.022975921543411725 ,
color:"#C70039",
},{
name: ' Mr. Holland's Opus ' ,
y: 0.021263155570788162 ,
color:"#C70039",
},{
name: ' Dolphin Tale ' ,
y: 0.019222026117505144 ,
color:"#C70039",
},{
name: ' The Last Song ' ,
y: 0.018543779884263614 ,
color:"#C70039",
},{
name: ' The Last Song ' ,
y: 0.01783509453584228 ,
color:"#C70039",
},{
name: ' Footloose ' ,
y: 0.01600868559832901 ,
color:"#C70039",
},{
name: ' War Room ' ,
y: 0.014803758436182365 ,
color:"#C70039",
},{
name: ' War Room ' ,
y: 0.01480142543096974 ,
color:"#C70039",
},{
name: ' Staying Alive ' ,
y: 0.012984128419475586 ,
color:"#C70039",
},{
name: ' God's Not Dead ' ,
y: 0.012939149039390006 ,
color:"#C70039",
},{
name: ' August Rush ' ,
y: 0.012926721684865472 ,
color:"#C70039",
},{
name: ' The Remains of the Day ' ,
y: 0.012796507580034979 ,
color:"#C70039",
},{
name: ' The Natural ' ,
y: 0.009604138096565115 ,
color:"#C70039",
},{
name: ' A Walk to Remember ' ,
y: 0.009503078553444008 ,
color:"#C70039",
},{
name: ' Urban Cowboy ' ,
y: 0.00938770284197271 ,
color:"#C70039",
},{
name: ' We Are Marshall ' ,
y: 0.00871282606051339 ,
color:"#C70039",
},{
name: ' A River Runs Through It ' ,
y: 0.008691803002392428 ,
color:"#C70039",
},{
name: ' Absence of Malice ' ,
y: 0.00814690222317329 ,
color:"#C70039",
},{
name: ' Dreamer ' ,
y: 0.007751685937913325 ,
color:"#C70039",
},{
name: ' Overcomer ' ,
y: 0.0076238828719174916 ,
color:"#C70039",
},{
name: ' The Majestic ' ,
y: 0.007464482191583959 ,
color:"#C70039",
},{
name: ' Taps ' ,
y: 0.007174301162444658 ,
color:"#C70039",
},{
name: ' The Indian in the Cupboard ' ,
y: 0.007134299927272464 ,
color:"#C70039",
},{
name: ' Fireproof ' ,
y: 0.006697545744744855 ,
color:"#C70039",
},{
name: ' The Age of Innocence ' ,
y: 0.006453868752613964 ,
color:"#C70039",
},{
name: ' The Jazz Singer ' ,
y: 0.0054259378521386 ,
color:"#C70039",
},{
name: ' Tuck Everlasting ' ,
y: 0.0038705917063661553 ,
color:"#C70039",
},{
name: ' Akeelah and the Bee ' ,
y: 0.003791317549827424 ,
color:"#C70039",
},{
name: ' Honeysuckle Rose ' ,
y: 0.0035645778554022458 ,
color:"#C70039",
},{
name: ' Extraordinary Measures ' ,
y: 0.0031667604494077946 ,
color:"#C70039",
},{
name: ' Pure Country ' ,
y: 0.0030341993333299544 ,
color:"#C70039",
},{
name: ' The Night the Lights Went Out in Georgia ' ,
y: 0.002986036581637784 ,
color:"#C70039",
},{
name: ' Ragtime ' ,
y: 0.002985442325593059 ,
color:"#C70039",
},{
name: ' Music of the Heart ' ,
y: 0.002973159033139973 ,
color:"#C70039",
},{
name: ' The Spanish Prisoner ' ,
y: 0.002768218731331894 ,
color:"#C70039",
},{
name: ' The Lunchbox ' ,
y: 0.002447354481836171 ,
color:"#C70039",
},{
name: ' Gettysburg ' ,
y: 0.002154920481968384 ,
color:"#C70039",
},{
name: ' Somewhere in Time ' ,
y: 0.0019427570679668466 ,
color:"#C70039",
},{
name: ' What If... ' ,
y: 0.0017059930544033786 ,
color:"#C70039",
},{
name: ' Tender Mercies ' ,
y: 0.001689351884288976 ,
color:"#C70039",
},{
name: ' Three Wishes ' ,
y: 0.0014057056707795462 ,
color:"#C70039",
},{
name: ' Six Weeks ' ,
y: 0.0013341788523053774 ,
color:"#C70039",
},{
name: ' The Secret of Roan Inish ' ,
y: 0.0012208900400079781 ,
color:"#C70039",
},{
name: ' Eddie and the Cruisers ' ,
y: 0.0009577706708178526 ,
color:"#C70039",
},{
name: ' Fluke ' ,
y: 0.0007978977870279055 ,
color:"#C70039",
},{
name: ' The Ultimate Gift ' ,
y: 0.0006880444549621317 ,
color:"#C70039",
},{
name: ' Looker ' ,
y: 0.000656528875970674 ,
color:"#C70039",
},{
name: ' Newsies ' ,
y: 0.000564139068343821 ,
color:"#C70039",
},{
name: ' Table for Five ' ,
y: 0.0004802069048282557 ,
color:"#C70039",
},{
name: ' Testament ' ,
y: 0.0004091542906726049 ,
color:"#C70039",
},{
name: ' Man, Woman and Child ' ,
y: 0.0003413290670898207 ,
color:"#C70039",
},{
name: ' Cattle Annie and Little Britches ' ,
y: 0.00010701010701676988 ,
color:"#C70039",
},{
name: ' Five Days One Summer ' ,
y: 3.983316275550381e-05 ,
color:"#C70039",
Getting the number of Tickets sold in the G-rated category and the Names of the Movies from the 'Drama_DataFrame' dataframe.
var3 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'G':
var3.append((Drama_DataFrame.Movie[i],Drama_DataFrame.Tickets[i]))
print(var3)
[('La traviata', 19549), ('A Sunday in the Country', 241114), ('Little Dorrit', 102523), ('Prancer', 1858714), ('The Secret Garden', 872124), ('Through the Olive Trees', 4030), ('A Little Princess', 1001545), ('The Rookie', 8069354), ('Beauty and the Beast 1991', 43865684), ('The Little Rascals', 6694795), ('Ramona and Beezus', 2746962), ('The Black Stallion', 3779964), ('The Hunchback of Notre Drame', 32550000), ('Babe', 24610000), ('Pollyanna', 375000), ('Babe: Pig in the City', 6913186), ('Lassie Come Home', 451700), ("Charlotte's Web", 14398571), ('A Little Princess', 1001545), ('Kit Kittredge: An American Girl', 1765797), ('The Rookie', 8049152), ('The Secret Garden', 31128100), ('The Sound of Music', 28621420), ('The Tale of Despereaux', 9048232), ('The Lion King 1994', 98621487), ('Bambi 1942', 26800000), ('My Fair Lady 1964', 7207164), ('Before the Wrath', 10900), ("Hachiko: A Dog's Story", 4770742), ('Giant', 3019441), ('The Ten Commandments 1966', 6550000), ('The Quiet Man', 760038), ('Three Cions in the Fountain', 1200000), ('Miracle of Marcelino', 59286)]
After creating the 'var3' list, the list will be sorted by the 'Tickets' sold by each movie going in decending order.
var3.sort(key=lambda i:i[1],reverse=True)
print(var3)
[('The Lion King 1994', 98621487), ('Beauty and the Beast 1991', 43865684), ('The Hunchback of Notre Drame', 32550000), ('The Secret Garden', 31128100), ('The Sound of Music', 28621420), ('Bambi 1942', 26800000), ('Babe', 24610000), ("Charlotte's Web", 14398571), ('The Tale of Despereaux', 9048232), ('The Rookie', 8069354), ('The Rookie', 8049152), ('My Fair Lady 1964', 7207164), ('Babe: Pig in the City', 6913186), ('The Little Rascals', 6694795), ('The Ten Commandments 1966', 6550000), ("Hachiko: A Dog's Story", 4770742), ('The Black Stallion', 3779964), ('Giant', 3019441), ('Ramona and Beezus', 2746962), ('Prancer', 1858714), ('Kit Kittredge: An American Girl', 1765797), ('Three Cions in the Fountain', 1200000), ('A Little Princess', 1001545), ('A Little Princess', 1001545), ('The Secret Garden', 872124), ('The Quiet Man', 760038), ('Lassie Come Home', 451700), ('Pollyanna', 375000), ('A Sunday in the Country', 241114), ('Little Dorrit', 102523), ('Miracle of Marcelino', 59286), ('La traviata', 19549), ('Before the Wrath', 10900), ('Through the Olive Trees', 4030)]
all_to = []
for i in var3:all_to.append(i[1])
print(sum(all_to))
377168119
Using a for loop to put the Name and the Number of Tickets sold in the G rated Genre in html code which will be pasted in the cell below to create the Javascript graph that is interactive.
for i,x in enumerate(range(len(var9))):
print(' },{ \n name:',"'",var3[i][0],"'",','+'\n y:',var3[i][1]/377168119,',','\n sliced:true,'+'\n color:"#FFAA00",')
},{
name: ' The Lion King 1994 ' ,
y: 0.26147885261744513 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Beauty and the Beast 1991 ' ,
y: 0.11630273554483538 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Hunchback of Notre Drame ' ,
y: 0.0863010375487224 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Secret Garden ' ,
y: 0.08253110067343736 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Sound of Music ' ,
y: 0.07588504584079123 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Bambi 1942 ' ,
y: 0.07105584658389433 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Babe ' ,
y: 0.06524941732946415 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Charlotte's Web ' ,
y: 0.03817547208967575 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Tale of Despereaux ' ,
y: 0.02398991734505535 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Rookie ' ,
y: 0.021394581337878135 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Rookie ' ,
y: 0.021341019016509186 ,
sliced:true,
color:"#FFAA00",
},{
name: ' My Fair Lady 1964 ' ,
y: 0.019108624607797248 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Babe: Pig in the City ' ,
y: 0.018329189694847987 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Little Rascals ' ,
y: 0.017750161433978465 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Ten Commandments 1966 ' ,
y: 0.017366261012108503 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Hachiko: A Dog's Story ' ,
y: 0.012648847449378404 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Black Stallion ' ,
y: 0.010021960525247894 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Giant ' ,
y: 0.008005557330788077 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Ramona and Beezus ' ,
y: 0.007283123524021923 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Prancer ' ,
y: 0.004928078239825991 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Kit Kittredge: An American Girl ' ,
y: 0.004681723907847047 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Three Cions in the Fountain ' ,
y: 0.0031816050709206414 ,
sliced:true,
color:"#FFAA00",
},{
name: ' A Little Princess ' ,
y: 0.002655433875629345 ,
sliced:true,
color:"#FFAA00",
},{
name: ' A Little Princess ' ,
y: 0.002655433875629345 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Secret Garden ' ,
y: 0.002312295117392995 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Quiet Man ' ,
y: 0.002015117295743652 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Lassie Come Home ' ,
y: 0.0011976091754457114 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Pollyanna ' ,
y: 0.0009942515846627004 ,
sliced:true,
color:"#FFAA00",
},{
name: ' A Sunday in the Country ' ,
y: 0.0006392746042249663 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Little Dorrit ' ,
y: 0.0002718230805716641 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Miracle of Marcelino ' ,
y: 0.0001571871985288343 ,
sliced:true,
color:"#FFAA00",
},{
name: ' La traviata ' ,
y: 5.183099794285635e-05 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Before the Wrath ' ,
y: 2.8899579394195828e-05 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Through the Olive Trees ' ,
y: 1.0684890363175155e-05 ,
sliced:true,
color:"#FFAA00",
Getting the number of Tickets sold in the PG-13 rated category and the Names of the Movies from the 'Drama_DataFrame' dataframe.
var4 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if x == 'PG-13':
var4.append((Drama_DataFrame.Movie[i],Drama_DataFrame.Tickets[i]))
print(var4)
[('Gravity', 69369867), ('Sing', 63445479), ('Contagion', 13755159), ('Trouble with the Curve', 4781891), ('Burlesque', 9055268), ('Creed II', 21359152), ('The Post', 17974888), ('Hereafter', 10866027), ('Dream House', 4164217), ('Upside Down', 2638704), ('Anna Karenina', 7100463), ('Arrival', 20312789), ('Charlie St. Cloud', 4847808), ('Bridge of Spies', 16249834), ('The Impossible', 16959061), ('Paranoia', 1634077), ('Victor Frankenstein', 3112437), ('Water for Elephants', 11680972), ('Creed', 17356758), ('The Rite', 9714399), ('Collateral Beauty', 8530909), ('True Grit', 25227693), ('The Tree of Life', 6172183), ('The Longest Ride', 6380293), ('Step Up Revolution', 16555229), ('The Vow', 19761816), ('The Age of Adaline', 6898454), ('The Space Between Us', 1648140), ('Safe Haven', 9405095), ('Anonymous', 1581551), ('The Best of Me', 4105942), ('The Help', 21312000), ('Dear John', 14203351), ('The Lucky One', 9663383), ('The Giver', 6654020), ('Draft Day', 2984748), ('Rings', 8291728), ('Fences', 6428288), ('The Beaver', 504604), ('Me Before You', 20826520), ('The Light Between Oceans', 2228173), ('The Book Thief', 7608671), ('Labor Day', 1418981), ('Midnight Special', 768025), ('A Quiet Place', 33452229), ('Beastly', 3802823), ('The Roommate', 5254571), ('Remember Me', 5650612), ('The Woman in Black', 12895590), ('Country Strong', 2060199), ('One Day', 5916869), ('Suffragette', 3404491), ('The Perks of Being a Wallflower', 3306930), ('Project Almanac', 3290944), ('Wish Upon', 2347734), ('If I Stay', 7835617), ('Brooklyn', 6207614), ('Everything, Everything', 6160314), ('Mud', 3155696), ('Amour', 3678704), ('Ouija: Origin of Evil', 8183187), ('Black or White', 2197102), ('The Bye Bye Man', 3118773), ('Gifted', 3696466), ('The Words', 1636971), ('Lights Out', 14880651), ('Still Alice', 4169961), ('Before I Fall', 1894568), ('Rabbit Hole', 620503), ('Maggie', 102776), ('Anna', 120000), ('Ida', 1529836), ('Courageous', 3518588), ('Mustang', 555258), ('Like Crazy', 372840), ('Another Earth', 210278)]
After creating the 'var2' list, the list will be sorted by the 'Tickets' sold by each movie going in decending order.
var4.sort(key=lambda i:i[1],reverse=True)
print(var4)
[('Gravity', 69369867), ('Sing', 63445479), ('A Quiet Place', 33452229), ('True Grit', 25227693), ('Creed II', 21359152), ('The Help', 21312000), ('Me Before You', 20826520), ('Arrival', 20312789), ('The Vow', 19761816), ('The Post', 17974888), ('Creed', 17356758), ('The Impossible', 16959061), ('Step Up Revolution', 16555229), ('Bridge of Spies', 16249834), ('Lights Out', 14880651), ('Dear John', 14203351), ('Contagion', 13755159), ('The Woman in Black', 12895590), ('Water for Elephants', 11680972), ('Hereafter', 10866027), ('The Rite', 9714399), ('The Lucky One', 9663383), ('Safe Haven', 9405095), ('Burlesque', 9055268), ('Collateral Beauty', 8530909), ('Rings', 8291728), ('Ouija: Origin of Evil', 8183187), ('If I Stay', 7835617), ('The Book Thief', 7608671), ('Anna Karenina', 7100463), ('The Age of Adaline', 6898454), ('The Giver', 6654020), ('Fences', 6428288), ('The Longest Ride', 6380293), ('Brooklyn', 6207614), ('The Tree of Life', 6172183), ('Everything, Everything', 6160314), ('One Day', 5916869), ('Remember Me', 5650612), ('The Roommate', 5254571), ('Charlie St. Cloud', 4847808), ('Trouble with the Curve', 4781891), ('Still Alice', 4169961), ('Dream House', 4164217), ('The Best of Me', 4105942), ('Beastly', 3802823), ('Gifted', 3696466), ('Amour', 3678704), ('Courageous', 3518588), ('Suffragette', 3404491), ('The Perks of Being a Wallflower', 3306930), ('Project Almanac', 3290944), ('Mud', 3155696), ('The Bye Bye Man', 3118773), ('Victor Frankenstein', 3112437), ('Draft Day', 2984748), ('Upside Down', 2638704), ('Wish Upon', 2347734), ('The Light Between Oceans', 2228173), ('Black or White', 2197102), ('Country Strong', 2060199), ('Before I Fall', 1894568), ('The Space Between Us', 1648140), ('The Words', 1636971), ('Paranoia', 1634077), ('Anonymous', 1581551), ('Ida', 1529836), ('Labor Day', 1418981), ('Midnight Special', 768025), ('Rabbit Hole', 620503), ('Mustang', 555258), ('The Beaver', 504604), ('Like Crazy', 372840), ('Another Earth', 210278), ('Anna', 120000), ('Maggie', 102776)]
all_to = []
for i in var4:all_to.append(i[1])
print(sum(all_to))
690767742
Using a for loop to put the Name and the Number of Tickets sold in the PG-13 rated Genre in html code which will be pasted in the cell below to create the Javascript graph that is interactive.
for i,x in enumerate(range(len(var4))):
print(' },{ \n name:',"'",var4[i][0],"'",','+'\n y:',var4[i][1]/690767742*100,',','\n color:"#900C3F",')
},{
name: ' Gravity ' ,
y: 10.04243000681407 ,
color:"#900C3F",
},{
name: ' Sing ' ,
y: 9.184777334318545 ,
color:"#900C3F",
},{
name: ' A Quiet Place ' ,
y: 4.8427607379500355 ,
color:"#900C3F",
},{
name: ' True Grit ' ,
y: 3.652123784321127 ,
color:"#900C3F",
},{
name: ' Creed II ' ,
y: 3.0920888022590955 ,
color:"#900C3F",
},{
name: ' The Help ' ,
y: 3.08526277418438 ,
color:"#900C3F",
},{
name: ' Me Before You ' ,
y: 3.014981553669598 ,
color:"#900C3F",
},{
name: ' Arrival ' ,
y: 2.9406105359216386 ,
color:"#900C3F",
},{
name: ' The Vow ' ,
y: 2.860848125707642 ,
color:"#900C3F",
},{
name: ' The Post ' ,
y: 2.6021608866616703 ,
color:"#900C3F",
},{
name: ' Creed ' ,
y: 2.5126763953606854 ,
color:"#900C3F",
},{
name: ' The Impossible ' ,
y: 2.4551032089162033 ,
color:"#900C3F",
},{
name: ' Step Up Revolution ' ,
y: 2.3966418802457627 ,
color:"#900C3F",
},{
name: ' Bridge of Spies ' ,
y: 2.352430927499796 ,
color:"#900C3F",
},{
name: ' Lights Out ' ,
y: 2.154219152868317 ,
color:"#900C3F",
},{
name: ' Dear John ' ,
y: 2.0561688301883674 ,
color:"#900C3F",
},{
name: ' Contagion ' ,
y: 1.9912856613967362 ,
color:"#900C3F",
},{
name: ' The Woman in Black ' ,
y: 1.866848901001518 ,
color:"#900C3F",
},{
name: ' Water for Elephants ' ,
y: 1.691012954105202 ,
color:"#900C3F",
},{
name: ' Hereafter ' ,
y: 1.5730362521763501 ,
color:"#900C3F",
},{
name: ' The Rite ' ,
y: 1.4063191445323746 ,
color:"#900C3F",
},{
name: ' The Lucky One ' ,
y: 1.398933738860087 ,
color:"#900C3F",
},{
name: ' Safe Haven ' ,
y: 1.3615422996981814 ,
color:"#900C3F",
},{
name: ' Burlesque ' ,
y: 1.310899083645976 ,
color:"#900C3F",
},{
name: ' Collateral Beauty ' ,
y: 1.2349894879717762 ,
color:"#900C3F",
},{
name: ' Rings ' ,
y: 1.2003641015419624 ,
color:"#900C3F",
},{
name: ' Ouija: Origin of Evil ' ,
y: 1.1846510053157635 ,
color:"#900C3F",
},{
name: ' If I Stay ' ,
y: 1.1343345271615188 ,
color:"#900C3F",
},{
name: ' The Book Thief ' ,
y: 1.1014803583575563 ,
color:"#900C3F",
},{
name: ' Anna Karenina ' ,
y: 1.0279088857626475 ,
color:"#900C3F",
},{
name: ' The Age of Adaline ' ,
y: 0.9986647581467405 ,
color:"#900C3F",
},{
name: ' The Giver ' ,
y: 0.9632789135078054 ,
color:"#900C3F",
},{
name: ' Fences ' ,
y: 0.9306004911850675 ,
color:"#900C3F",
},{
name: ' The Longest Ride ' ,
y: 0.9236524249854158 ,
color:"#900C3F",
},{
name: ' Brooklyn ' ,
y: 0.8986542976119462 ,
color:"#900C3F",
},{
name: ' The Tree of Life ' ,
y: 0.8935250772031564 ,
color:"#900C3F",
},{
name: ' Everything, Everything ' ,
y: 0.8918068441012986 ,
color:"#900C3F",
},{
name: ' One Day ' ,
y: 0.8565641734903133 ,
color:"#900C3F",
},{
name: ' Remember Me ' ,
y: 0.8180190904166453 ,
color:"#900C3F",
},{
name: ' The Roommate ' ,
y: 0.7606856372282654 ,
color:"#900C3F",
},{
name: ' Charlie St. Cloud ' ,
y: 0.7017999980664992 ,
color:"#900C3F",
},{
name: ' Trouble with the Curve ' ,
y: 0.6922574273886692 ,
color:"#900C3F",
},{
name: ' Still Alice ' ,
y: 0.6036704881334775 ,
color:"#900C3F",
},{
name: ' Dream House ' ,
y: 0.6028389495929878 ,
color:"#900C3F",
},{
name: ' The Best of Me ' ,
y: 0.5944026841948274 ,
color:"#900C3F",
},{
name: ' Beastly ' ,
y: 0.5505212199095423 ,
color:"#900C3F",
},{
name: ' Gifted ' ,
y: 0.5351242936297972 ,
color:"#900C3F",
},{
name: ' Amour ' ,
y: 0.5325529517850589 ,
color:"#900C3F",
},{
name: ' Courageous ' ,
y: 0.5093735254359923 ,
color:"#900C3F",
},{
name: ' Suffragette ' ,
y: 0.4928561067635958 ,
color:"#900C3F",
},{
name: ' The Perks of Being a Wallflower ' ,
y: 0.47873254625720496 ,
color:"#900C3F",
},{
name: ' Project Almanac ' ,
y: 0.4764183096436486 ,
color:"#900C3F",
},{
name: ' Mud ' ,
y: 0.45683893559696653 ,
color:"#900C3F",
},{
name: ' The Bye Bye Man ' ,
y: 0.45149372363135043 ,
color:"#900C3F",
},{
name: ' Victor Frankenstein ' ,
y: 0.4505764833471335 ,
color:"#900C3F",
},{
name: ' Draft Day ' ,
y: 0.43209139896402404 ,
color:"#900C3F",
},{
name: ' Upside Down ' ,
y: 0.38199583442621154 ,
color:"#900C3F",
},{
name: ' Wish Upon ' ,
y: 0.339873137851304 ,
color:"#900C3F",
},{
name: ' The Light Between Oceans ' ,
y: 0.32256471524693753 ,
color:"#900C3F",
},{
name: ' Black or White ' ,
y: 0.3180666765993829 ,
color:"#900C3F",
},{
name: ' Country Strong ' ,
y: 0.29824771406305767 ,
color:"#900C3F",
},{
name: ' Before I Fall ' ,
y: 0.27426990069261226 ,
color:"#900C3F",
},{
name: ' The Space Between Us ' ,
y: 0.2385953917344334 ,
color:"#900C3F",
},{
name: ' The Words ' ,
y: 0.23697849515387473 ,
color:"#900C3F",
},{
name: ' Paranoia ' ,
y: 0.23655954102153195 ,
color:"#900C3F",
},{
name: ' Anonymous ' ,
y: 0.22895553799615617 ,
color:"#900C3F",
},{
name: ' Ida ' ,
y: 0.2214689405690285 ,
color:"#900C3F",
},{
name: ' Labor Day ' ,
y: 0.20542085475670635 ,
color:"#900C3F",
},{
name: ' Midnight Special ' ,
y: 0.1111842596726238 ,
color:"#900C3F",
},{
name: ' Rabbit Hole ' ,
y: 0.0898280221081893 ,
color:"#900C3F",
},{
name: ' Mustang ' ,
y: 0.08038273449080661 ,
color:"#900C3F",
},{
name: ' The Beaver ' ,
y: 0.07304973427667674 ,
color:"#900C3F",
},{
name: ' Like Crazy ' ,
y: 0.05397472657314678 ,
color:"#900C3F",
},{
name: ' Another Earth ' ,
y: 0.03044120146536895 ,
color:"#900C3F",
},{
name: ' Anna ' ,
y: 0.017371975079867003 ,
color:"#900C3F",
},{
name: ' Maggie ' ,
y: 0.014878517590070093 ,
color:"#900C3F",
Putting all the number of tickest that each movie made in each System Rating in a list.
ttl = var1+var6+var9+var12+var
print(ttl)
[2041284, 1735627, 100840, 27784, 161478, 2041222, 2035075, 9841006, 49606, 1512116, 102215, 6709192, 2041284, 1946584, 3775075, 1530711, 2041284, 1946584, 1656624, 231503, 382224, 21312000, 465911, 6516743, 123684, 266194, 20557, 2041284, 209430, 345342, 278354, 10309, 5028356, 389424, 203892, 69087, 900000, 2041222, 10117304, 3614771, 41380, 6516743, 574645, 62729, 191417, 256182, 100840, 102215, 147081, 18004778, 9606872, 1582698, 30460471, 9267895, 7397524, 1223150, 970960, 4691829, 54235135, 7398690, 30593772, 21660121, 3810299, 2711800, 53482, 3730633, 4749492, 1934462, 3874173, 11483011, 4354536, 1894842, 343874, 13758706, 6460576, 3347330, 8913705, 852629, 6466787, 10626997, 3565613, 398777, 702550, 15203638, 17112033, 1383513, 1485939, 13458278, 610182, 6395497, 1076996, 3225544, 1516446, 12795619, 281948, 4344029, 1781521, 15729752, 3585605, 11928543, 4071696, 1492078, 328123, 1492375, 12505269, 54936832, 666802, 19908, 6489267, 478679, 844312, 204489, 240000, 170591, 8000894, 4800000, 19549, 241114, 102523, 1858714, 872124, 4030, 1001545, 8069354, 43865684, 6694795, 2746962, 3779964, 32550000, 24610000, 375000, 6913186, 451700, 14398571, 1001545, 1765797, 8049152, 31128100, 28621420, 9048232, 98621487, 26800000, 7207164, 10900, 4770742, 3019441, 6550000, 760038, 1200000, 59286, 69369867, 63445479, 13755159, 4781891, 9055268, 21359152, 17974888, 10866027, 4164217, 2638704, 7100463, 20312789, 4847808, 16249834, 16959061, 1634077, 3112437, 11680972, 17356758, 9714399, 8530909, 25227693, 6172183, 6380293, 16555229, 19761816, 6898454, 1648140, 9405095, 1581551, 4105942, 21312000, 14203351, 9663383, 6654020, 2984748, 8291728, 6428288, 504604, 20826520, 2228173, 7608671, 1418981, 768025, 33452229, 3802823, 5254571, 5650612, 12895590, 2060199, 5916869, 3404491, 3306930, 3290944, 2347734, 7835617, 6207614, 6160314, 3155696, 3678704, 8183187, 2197102, 3118773, 3696466, 1636971, 14880651, 4169961, 1894568, 620503, 102776, 120000, 1529836, 3518588, 555258, 372840, 210278, 14263436, 44994832, 5446297, 36856719, 8415403, 38139849, 37135062, 7496685, 13461244, 57099810, 5064742, 2468752, 16055844, 679277, 7773592, 3239868, 3105473, 406502, 3801787, 4660405, 2827040, 372775, 771963, 821757, 758501, 1117372, 52873, 33126671, 3835839, 3626278, 1183113, 1985917, 3583071, 1203491, 4284352, 5617894, 7013390, 217962, 2181730, 7773387, 1076528, 38295, 1749924, 282101, 1753600, 497202, 5727305, 67948, 4045452, 2043323, 3896904, 2325193, 1661076, 1613155, 1129532, 1015342, 208839, 632852, 2127029, 1424493, 1656624, 543891, 115631, 85240, 6238, 42945, 276978, 5476692, 7721184, 3471817, 195168, 63680, 244758, 917129, 325608, 1300000, 1100000]
Adding all the number of tickest that each movie made in each System Rating togther and storing it in a varible.
tt2 = 0
for i in ttl:tt2+=i
print(tt2)
2121357473
j = 690767742+377168119+499784567+103856414+449780631
Getting the Percentage of how many Tickets that was sold in this datarame that belonged to the R-rated category.
v = 0
for i in var:
v+=i
(v/tt2*100)
21.20249117485726
Getting the Average number of tickets sold by R-rated Drama movies .
v/len(var)
5841306.896103896
Getting the Percentage of how many Tickets that was sold in this datarame that belonged to the NC-17 rated category.
v1 = 0
for i in var3:
v1+=i
(v1/tt2*100)
4.895752616984794
Getting the Average number of tickets sold by NC-17 rated Drama movies .
v1/len(var3)
2119518.6530612246
Getting the Percentage of how many Tickets that was sold in this datarame that belonged to the PG rated category.
v2 = 0
for i in var6:
v2+=i
v2/tt2*100
23.559658066172613
Getting the Average number of tickets sold by PG-rated Drama movies .
v2/len(var6)
7459471.149253732
Getting the Percentage of how many Tickets that was sold in this datarame that belonged to the G rated category.
v3 = 0
for i in var9:
v3+=i
v3/tt2*100
17.779564444016742
Getting the Average number of tickets sold by G-rated Drama movies .
v3/len(var9)
11093179.970588235
Getting the Percentage of how many Tickets that was sold in this datarame that belonged to the PG-13 rated category.
v4= 0
for i in var12:
v4+=i
v4/tt2*100
32.56253369796859
Getting the Average number of tickets sold by PG-13 rated Drama movies .
v4/len(var12)
9089049.236842105
Getting the Amount of Tickets from movies that made Profit that are R-rated.
r_ticks = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='R':
r_ticks.append(Drama_DataFrame.Tickets[i])
print(r_ticks)
[44994832, 36856719, 8415403, 38139849, 37135062, 7496685, 13461244, 57099810, 5064742, 16055844, 7773592, 3239868, 3105473, 3801787, 4660405, 2827040, 33126671, 3835839, 3626278, 1985917, 3583071, 1203491, 4284352, 5617894, 7013390, 2181730, 7773387, 1076528, 1749924, 1753600, 497202, 5727305, 4045452, 2043323, 3896904, 2325193, 1661076, 1613155, 1129532, 1015342, 208839, 632852, 2127029, 1424493, 1656624, 543891, 115631, 42945, 276978, 5476692, 7721184, 3471817, 195168, 325608, 1300000, 1100000]
Getting the Budget from movies that made Profit that are R-rated.
r_bud = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='R':
r_bud.append(Drama_DataFrame.Production_Budget[i])
print(r_bud)
[100000000.0, 61000000.0, 60000000.0, 55000000.0, 55000000.0, 55000000.0, 52500000.0, 40000000.0, 37500000.0, 31000000.0, 23000000.0, 22500000.0, 22500000.0, 21000000.0, 20000000.0, 20000000.0, 13000000.0, 13000000.0, 13000000.0, 12000000.0, 12000000.0, 12000000.0, 11800000.0, 11000000.0, 10000000.0, 9400000.0, 8500000.0, 7000000.0, 5000000.0, 4900000.0, 4750000.0, 4000000.0, 3500000.0, 3400000.0, 3300000.0, 3000000.0, 2000000.0, 2000000.0, 2000000.0, 2000000.0, 2000000.0, 2000000.0, 1987650.0, 1500000.0, 1000000.0, 1000000.0, 1000000.0, 135000.0, 100000.0, 6000000.0, 8500000.0, 20000000.0, 100000.0, 2700000.0, 11500000.0, 9000000.0]
Getting the Return On Investment for the movies that made Profit that are R-rated.
percent_return_on_investment = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='R':
i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
percent_return_on_investment.append(round(i,0))
print(percent_return_on_investment)
[350.0, 504.0, 40.0, 593.0, 575.0, 36.0, 156.0, 1327.0, 35.0, 418.0, 238.0, 44.0, 38.0, 81.0, 133.0, 41.0, 2448.0, 195.0, 179.0, 65.0, 199.0, 0.0, 263.0, 411.0, 601.0, 132.0, 815.0, 54.0, 250.0, 258.0, 5.0, 1332.0, 1056.0, 501.0, 1081.0, 675.0, 731.0, 707.0, 465.0, 408.0, 4.0, 216.0, 970.0, 850.0, 1557.0, 444.0, 16.0, 218.0, 2670.0, 813.0, 808.0, 74.0, 1852.0, 21.0, 13.0, 22.0]
Getting the Net Profit Margin for the movies that made Profit that are R-rated.
net_profit = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x == 'R':
net_profit.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
print(net_profit)
[77, 83, 28, 85, 85, 26, 60, 92, 25, 80, 70, 30, 27, 44, 57, 29, 96, 66, 64, 39, 66, 0, 72, 80, 85, 56, 89, 34, 71, 72, 4, 93, 91, 83, 91, 87, 87, 87, 82, 80, 4, 68, 90, 89, 93, 81, 13, 68, 96, 89, 88, 42, 94, 17, 11, 18]
Printing out 'R' 56 times for the R-rated category in the Javascript graph below.
system1 = []
for i in range(56):
system1.append('R')
print(system1)
['R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R', 'R']
Getting the Amount of Tickets from movies that made Profit that are PG-rated.
pg_ticks = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='PG':
pg_ticks.append(Drama_DataFrame.Tickets[i])
print(pg_ticks)
[18004778, 9606872, 30460471, 9267895, 7397524, 1223150, 970960, 4691829, 54235135, 7398690, 30593772, 21660121, 3810299, 2711800, 4749492, 1934462, 3874173, 11483011, 1894842, 13758706, 6460576, 3347330, 8913705, 6466787, 10626997, 15203638, 17112033, 1383513, 13458278, 610182, 6395497, 1516446, 12795619, 4344029, 1781521, 15729752, 3585605, 11928543, 4071696, 1492375, 12505269, 54936832, 6489267, 844312, 8000894, 4800000]
Getting the Budget from movies that made Profit that are PG-rated.
pg_bud = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='PG':
pg_bud.append(Drama_DataFrame.Production_Budget[i])
print(pg_bud)
[180000000.0, 37000000.0, 20000000.0, 20000000.0, 3000000.0, 1700000.0, 5100000.0, 10000000.0, 95000000.0, 3000000.0, 20000000.0, 40000000.0, 5000000.0, 422000.0, 11800000.0, 15000000.0, 32000000.0, 40000000.0, 8000000.0, 17000000.0, 30000000.0, 500000.0, 20000000.0, 2000000.0, 23000000.0, 32000000.0, 90000000.0, 10000000.0, 16000000.0, 3000000.0, 15000000.0, 10000000.0, 20000000.0, 12000000.0, 5000000.0, 7000000.0, 14000000.0, 15000000.0, 12000000.0, 7500000.0, 17000000.0, 5000000.0, 22000000.0, 4500000.0, 8200000.0, 28000000.0]
Getting the Return On Investment for the movies that made Profit that are PG-rated.
percent_return_on_investment1 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='PG':
i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
percent_return_on_investment1.append(round(i,0))
print(percent_return_on_investment1)
[0.0, 160.0, 1423.0, 363.0, 2366.0, 620.0, 90.0, 369.0, 471.0, 2366.0, 1430.0, 442.0, 662.0, 6326.0, 302.0, 29.0, 21.0, 187.0, 137.0, 709.0, 115.0, 6595.0, 346.0, 3133.0, 362.0, 375.0, 90.0, 38.0, 741.0, 103.0, 326.0, 52.0, 540.0, 262.0, 256.0, 2147.0, 156.0, 695.0, 239.0, 99.0, 636.0, 10887.0, 195.0, 88.0, 876.0, 71.0]
Getting the Net Profit Margin for the movies that made Profit that are PG-rated.
net_profit1 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x == 'PG':
net_profit1.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
print(net_profit1)
[0, 61, 93, 78, 95, 86, 47, 78, 82, 95, 93, 81, 86, 98, 75, 22, 17, 65, 57, 87, 53, 98, 77, 96, 78, 78, 47, 27, 88, 50, 76, 34, 84, 72, 71, 95, 60, 87, 70, 49, 86, 99, 66, 46, 89, 41]
Printing out 'PG' 46 times for the PG-rated category in the Javascript graph below.
system2 = []
for i in range(46):
system2.append('PG')
print(system2)
['PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG', 'PG']
Getting the Amount of Tickets from movies that made Profit that are G-rated.
g_ticks = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='G':
g_ticks.append(Drama_DataFrame.Tickets[i])
print(g_ticks)
[241114, 1858714, 8069354, 43865684, 6694795, 2746962, 3779964, 32550000, 24610000, 375000, 451700, 14398571, 1765797, 8049152, 31128100, 28621420, 9048232, 98621487, 26800000, 7207164, 4770742, 3019441, 6550000, 760038, 1200000]
Getting the Budget from movies that made Profit that are G-rated.
g_bud = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='G':
g_bud.append(Drama_DataFrame.Production_Budget[i])
print(g_bud)
[700000.0, 7000000.0, 22000000.0, 20000000.0, 23000000.0, 15000000.0, 2700000.0, 70000000.0, 30000000.0, 2500000.0, 666000.0, 85000000.0, 10000000.0, 22000000.0, 18000000.0, 8200000.0, 60000000.0, 45000000.0, 858000.0, 17000000.0, 10000000.0, 6400000.0, 13000000.0, 1750000.0, 1700000.0]
Getting the Return On Investment for the movies that made Profit that are G-rated.
percent_return_on_investment2 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='G':
i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
percent_return_on_investment2.append(round(i,0))
print(percent_return_on_investment2)
[244.0, 166.0, 267.0, 2093.0, 191.0, 83.0, 1300.0, 365.0, 720.0, 50.0, 578.0, 69.0, 77.0, 266.0, 1629.0, 3390.0, 51.0, 2092.0, 31135.0, 324.0, 377.0, 372.0, 404.0, 334.0, 606.0]
Getting the Net Profit Margin for the movies that made Profit that are G-rated.
net_profit2 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x == 'G':
net_profit2.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
print(net_profit2)
[70, 62, 72, 95, 65, 45, 92, 78, 87, 33, 85, 40, 43, 72, 94, 97, 33, 95, 99, 76, 79, 78, 80, 76, 85]
Printing out 'G' 25 times for the G-rated category in the Javascript graph below.
system3 = []
for i in range(25):
system3.append('G')
print(system3)
['G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G']
Getting the Amount of Tickets from movies that made Profit that are PG-13 rated.
pg13_ticks = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='PG-13':
pg13_ticks.append(Drama_DataFrame.Tickets[i])
print(pg13_ticks)
[69369867, 63445479, 13755159, 9055268, 21359152, 17974888, 10866027, 7100463, 20312789, 4847808, 16249834, 16959061, 11680972, 17356758, 9714399, 8530909, 25227693, 6172183, 6380293, 16555229, 19761816, 6898454, 9405095, 4105942, 21312000, 14203351, 9663383, 6654020, 2984748, 8291728, 6428288, 20826520, 2228173, 7608671, 33452229, 3802823, 5254571, 5650612, 12895590, 2060199, 5916869, 3404491, 3306930, 3290944, 2347734, 7835617, 6207614, 6160314, 3155696, 3678704, 8183187, 2197102, 3118773, 3696466, 1636971, 14880651, 4169961, 1894568, 620503, 1529836, 3518588, 555258, 372840, 210278]
Getting the Budget from movies that made Profit that are PG-13 rated.
pg13_bud = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='PG-13':
pg13_bud.append(Drama_DataFrame.Production_Budget[i])
print(pg13_bud)
[110000000.0, 75000000.0, 60000000.0, 55000000.0, 50000000.0, 50000000.0, 50000000.0, 49000000.0, 47000000.0, 44000000.0, 40000000.0, 40000000.0, 38000000.0, 37000000.0, 37000000.0, 36000000.0, 35000000.0, 35000000.0, 34000000.0, 33000000.0, 30000000.0, 30000000.0, 28000000.0, 26000000.0, 25000000.0, 25000000.0, 25000000.0, 25000000.0, 25000000.0, 25000000.0, 24000000.0, 20000000.0, 20000000.0, 19000000.0, 17000000.0, 17000000.0, 16000000.0, 16000000.0, 15000000.0, 15000000.0, 15000000.0, 14000000.0, 13000000.0, 12000000.0, 12000000.0, 11000000.0, 11000000.0, 10000000.0, 10000000.0, 9700000.0, 9000000.0, 9000000.0, 7400000.0, 7000000.0, 6000000.0, 5000000.0, 5000000.0, 5000000.0, 5000000.0, 2600000.0, 2000000.0, 1400000.0, 250000.0, 175000.0]
Getting the Return On Investment for the movies that made Profit that are PG-13 rated.
percent_return_on_investment3 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='PG-13':
i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
percent_return_on_investment3.append(round(i,0))
print(percent_return_on_investment3)
[531.0, 746.0, 129.0, 65.0, 327.0, 259.0, 117.0, 45.0, 332.0, 10.0, 306.0, 324.0, 207.0, 369.0, 163.0, 137.0, 621.0, 76.0, 88.0, 402.0, 559.0, 130.0, 236.0, 58.0, 752.0, 468.0, 287.0, 166.0, 19.0, 232.0, 168.0, 941.0, 11.0, 300.0, 1868.0, 124.0, 228.0, 253.0, 760.0, 37.0, 294.0, 143.0, 154.0, 174.0, 96.0, 612.0, 464.0, 516.0, 216.0, 279.0, 809.0, 144.0, 321.0, 428.0, 173.0, 2876.0, 734.0, 279.0, 24.0, 488.0, 1659.0, 297.0, 1391.0, 1102.0]
Getting the Net Profit Margin for the movies that made Profit that are PG-13 rated.
net_profit3 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x == 'PG-13':
net_profit3.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
print(net_profit3)
[84, 88, 56, 39, 76, 72, 53, 30, 76, 9, 75, 76, 67, 78, 61, 57, 86, 43, 46, 80, 84, 56, 70, 36, 88, 82, 74, 62, 16, 69, 62, 90, 10, 75, 94, 55, 69, 71, 88, 27, 74, 58, 60, 63, 48, 85, 82, 83, 68, 73, 89, 59, 76, 81, 63, 96, 88, 73, 19, 83, 94, 74, 93, 91]
Printing out 'PG-13' 64 times for the PG-13 rated category in the Javascript graph below.
system4 = []
for i in range(64):
system4.append('PG-13')
print(system4)
['PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13', 'PG-13']
Getting the Amount of Tickets from movies that made Profit that are NC-17 rated.
nc17_ticks = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='NC-17':
nc17_ticks.append(Drama_DataFrame.Tickets[i])
print(nc17_ticks)
[2041284, 1735627, 100840, 27784, 161478, 2041222, 9841006, 1512116, 6709192, 2041284, 1946584, 1530711, 2041284, 1946584, 1656624, 231503, 382224, 21312000, 6516743, 266194, 2041284, 345342, 5028356, 389424, 203892, 900000, 2041222, 10117304, 3614771, 41380, 6516743, 574645, 100840, 147081]
Getting the Budget from movies that made Profit that are NC-17 rated.
nc17_bud = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='NC-17':
nc17_bud.append(Drama_DataFrame.Production_Budget[i])
print(nc17_bud)
[6500000.0, 12500000.0, 1000000.0, 20000.0, 955472.0, 1500000.0, 9000000.0, 15000000.0, 15000000.0, 6500000.0, 4000000.0, 15000000.0, 6500000.0, 4074940.0, 1000000.0, 1000000.0, 3565572.0, 12000000.0, 15000000.0, 350000.0, 6500000.0, 904765.0, 34000000.0, 230000.0, 1000000.0, 1000000.0, 1500000.0, 6500000.0, 1250000.0, 12000.0, 15000000.0, 2200000.0, 50000.0, 612072.0]
Getting the Return On Investment for the movies that made Profit that are NC-17 rated.
percent_return_on_investment4 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x =='NC-17':
i = (Drama_DataFrame.Profit[i] / Drama_DataFrame.Production_Budget[i])*100
percent_return_on_investment4.append(round(i,0))
print(percent_return_on_investment4)
[214.0, 39.0, 1.0, 1289.0, 69.0, 1261.0, 993.0, 1.0, 347.0, 214.0, 387.0, 2.0, 214.0, 378.0, 1557.0, 132.0, 7.0, 1676.0, 334.0, 661.0, 214.0, 282.0, 48.0, 1593.0, 104.0, 800.0, 1261.0, 1457.0, 2792.0, 3348.0, 334.0, 161.0, 1917.0, 140.0]
Getting the Net Profit Margin for the movies that made Profit that are NC-17 rated.
# Creating the profit rev percentage column
net_profit4 = []
for i,x in enumerate(Drama_DataFrame.Rating):
if Drama_DataFrame.Profit[i] < 0: continue
elif x == 'NC-17':
net_profit4.append(int((Drama_DataFrame.Profit[i] / Drama_DataFrame.Worldwide_Gross[i])*100))
print(net_profit4)
[68, 27, 0, 92, 40, 92, 90, 0, 77, 68, 79, 2, 68, 79, 93, 56, 6, 94, 76, 86, 68, 73, 32, 94, 50, 88, 92, 93, 96, 97, 76, 61, 95, 58]
Printing out 'NC-17' 34 times for the NC-17 rated category in the Javascript graph below.
system5 = []
for i in range(34):
system5.append('NC-17')
print(system5)
['NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17', 'NC-17']
Creating a dataframe called 'df_data' that will be used to get some findings for the Javascript graphs below. This dataframe consist of the Name, Budget, Profit, Return On Investment, Number of Tickets, Nat Profit Margin, System Rating and the Season the movie was realesed of the movies that made Profit in the 'Drama_DataFrame' dataframe, that was created in the beginning of this project.
df_data = pd.DataFrame({'Budget':r_bud+pg_bud+g_bud+pg13_bud+nc17_bud,
'Season':season_r+season_pg+season_g+season_pg13+season_nc17,
"Profit":profit_int+profit_int1+profit_int2+profit_int3+profit_int4,
"Name":name+name1+name2+name3+name4,
"No.Tickets":r_ticks+pg_ticks+g_ticks+pg13_ticks+nc17_ticks,
"ROI":percent_return_on_investment+percent_return_on_investment1+
percent_return_on_investment2+percent_return_on_investment3+
percent_return_on_investment4,
"NPM":net_profit+net_profit1+net_profit2+net_profit3+net_profit4,
"System":system1+system2+system3+system4+system5
})
This is the 'df_data' dataframe.
df_data
| Budget | Season | Profit | Name | No.Tickets | ROI | NPM | System | |
|---|---|---|---|---|---|---|---|---|
| 0 | 100000000.0 | 1 | 349948323 | Django Unchained | 44994832 | 350.0 | 77 | R |
| 1 | 61000000.0 | 4 | 307567189 | Gone Girl | 36856719 | 504.0 | 83 | R |
| 2 | 60000000.0 | 2 | 24154026 | Priest | 8415403 | 40.0 | 28 | R |
| 3 | 55000000.0 | 1 | 326398492 | Fifty Shades Darker | 38139849 | 593.0 | 85 | R |
| 4 | 55000000.0 | 1 | 316350619 | Fifty Shades Freed | 37135062 | 575.0 | 85 | R |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 220 | 12000.0 | 2 | 401802 | Pink Flamingos | 41380 | 3348.0 | 97 | NC-17 |
| 221 | 15000000.0 | 4 | 50167430 | Lust, Caution | 6516743 | 334.0 | 76 | NC-17 |
| 222 | 2200000.0 | 4 | 3546453 | Happiness 1998 | 574645 | 161.0 | 61 | NC-17 |
| 223 | 50000.0 | 4 | 958404 | Whore 1991 | 100840 | 1917.0 | 95 | NC-17 |
| 224 | 612072.0 | 2 | 858737 | Law of Desire | 147081 | 140.0 | 58 | NC-17 |
225 rows × 8 columns
Checking to see the different season each movie in the R-rated category was realesed in. Based on the code below there are 14 movies that were realesed in 'Winter', 24 movies that were realesed in 'Autumn', 11 movies that were realesed in 'Spring' and 7 movies that were realesed in 'Summer'.
collections.Counter(season_r)
Counter({1: 14, 4: 24, 2: 11, 3: 7})
Getting the index of the R-rated movies that were realesed in Winter, Summer, Autumn and Spring.
index_one = []
index_two = []
index_three = []
index_four = []
for x,i in enumerate(df_data.System):
if i == 'R':
if df_data['Season'][x] ==1: index_one.append(x)
elif df_data['Season'][x] ==2: index_two.append(x)
elif df_data['Season'][x] ==3: index_three.append(x)
elif df_data['Season'][x] ==4: index_four.append(x)
Getting the number of Tickets sold in the Winter by R-rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['No.Tickets'][i]))
total1 = sum(sum1)
total1
240614259
Getting the Average Net Profit Margin made in the Winter by R-rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['NPM'][i]))
total2 = sum(sum1)
total2/14
72.5
Getting the Amount of Expenses spent in the Winter to produce R-rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['Budget'][i]))
total3 = sum(sum1)
total3
357700000
Getting the number of Tickets sold in the Spring by R-rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['No.Tickets'][i]))
total4 = sum(sum2)
total4
30059567
Getting the Average Net Profit Margin made in the Spring by R-rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['NPM'][i]))
total5 = sum(sum2)
total5/11
50.63636363636363
Getting the Amount of Expenses spent in the Spring to produce R-rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['Budget'][i]))
total6 = sum(sum2)
total6
125635000
Getting the number of Tickets sold in the Summer by R-rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['No.Tickets'][i]))
total7 = sum(sum3)
total7
23778392
Getting the Average Net Profit Margin made in the Summer by R-rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['NPM'][i]))
total8 = sum(sum3)
total8/7
77.14285714285714
Getting the Amount of Expenses spent in the Summer to produce R-rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['Budget'][i]))
total9 = sum(sum3)
total9
58100000
Getting the number of Tickets sold in the Autumn by R-rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['No.Tickets'][i]))
total10 = sum(sum4)
total10
125062444
Getting the Average Net Profit Margin made in the Autumn by R-rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['NPM'][i]))
total11 = sum(sum4)
total11/24
59.25
Getting the Amount of Expenses spent in the Autumn to produce R-rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['Budget'][i]))
total12 = sum(sum4)
total12
375637650
Putting all the Tickets that were sold in Winter, Spring, Summer and Autumn by R-rated Drama Movies, in a list for the Javascript graph below.
r_total_tick = [total1/14, total4/11, total7/7, total10/24]
print(r_total_tick)
[17186732.785714287, 2732687.909090909, 3396913.1428571427, 5210935.166666667]
Putting all the Net Profit Margin that was made in Winter, Spring, Summer and Autumn by R-rated Drama Movies, in a list for the Javascript graph below.
r_total_pro = [total2/14, total5/11, total8/7, total11/24]
print(r_total_pro)
[72.5, 50.63636363636363, 77.14285714285714, 59.25]
Putting all the Expenses that was spent in Winter, Spring, Summer and Autumn by R-rated Drama Movies, in a list for the Javascript graph below.
r_total_bud = [total3/14, total6/11, total9/7, total12/24]
print(r_total_bud)
[25550000.0, 11421363.636363637, 8300000.0, 15651568.75]
Checking to see the different season each movie in the PG-rated category was realesed in. Based on the code below there are 13 movies that were realesed in 'Winter', 13 movies that were realesed in 'Autumn', 9 movies that were realesed in 'Spring' and 11 movies that were realesed in 'Summer'.
collections.Counter(season_pg)
Counter({4: 13, 2: 9, 3: 11, 1: 13})
Getting the index of the PG-rated movies that were realesed in Winter, Summer, Autumn and Spring.
index_one = []
index_two = []
index_three = []
index_four = []
for x,i in enumerate(df_data.System):
if i == 'PG':
if df_data['Season'][x] ==1: index_one.append(x)
if df_data['Season'][x] ==2: index_two.append(x)
if df_data['Season'][x] ==3: index_three.append(x)
if df_data['Season'][x] ==4: index_four.append(x)
Getting the number of Tickets sold in the Winter by PG-rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['No.Tickets'][i]))
total13 = sum(sum1)
total13
109181083
Getting the Average Net Profit Margin made in the Winter by PG-rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['NPM'][i]))
total14 = sum(sum1)
total14/13
79.46153846153847
Getting the Amount of Expenses spent in the Winter to produce PG-rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['Budget'][i]))
total15 = sum(sum1)
total15
182122000
Getting the number of Tickets sold in the Spring by PG-rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['No.Tickets'][i]))
total16 = sum(sum2)
total16
100311458
Getting the Average Net Profit Margin made in the Spring by PG-rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['NPM'][i]))
total17 = sum(sum2)
total17/9
65.55555555555556
Getting the Amount of Expenses spent in the Spring to produce PG-rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['Budget'][i]))
total18 = sum(sum2)
total18
204500000
Getting the number of Tickets sold in the Summer by PG-rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['No.Tickets'][i]))
total19 = sum(sum3)
total19
131797019
Getting the Average Net Profit Margin made in the Summer by PG-rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['NPM'][i]))
total20 = sum(sum3)
total20/11
75.36363636363636
Getting the Amount of Expenses spent in the Summer to produce PG-rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['Budget'][i]))
total21 = sum(sum3)
total21
222500000
Getting the number of Tickets sold in the Autumn by PG-rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['No.Tickets'][i]))
total22 = sum(sum4)
total22
133239118
Getting the Average Net Profit Margin made in the Autumn by PG-rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['NPM'][i]))
total23 = sum(sum4)
total23/13
58.53846153846154
Getting the Amount of Expenses spent in the Autumn to produce PG-rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['Budget'][i]))
total24 = sum(sum4)
total24
383600000
Putting all the Tickets that were sold in Winter, Spring, Summer and Autumn by PG-rated Drama Movies, in a list for the Javascript graph below.
pg_total_tick = [total13//13, total16//9, total19//11, total22//13]
print(pg_total_tick)
[8398544, 11145717, 11981547, 10249162]
Putting all the Net Profit Margin that was made in Winter, Spring, Summer and Autumn by PG-rated Drama Movies, in a list for the Javascript graph below.
pg_total_pro = [total14//13, total17//9, total20//11, total23//13]
print(pg_total_pro)
[79, 65, 75, 58]
Putting all the Expenses that was spent in Winter, Spring, Summer and Autumn by PG-rated Drama Movies, in a list for the Javascript graph below.
pg_total_bud = [total15//13, total18//9, total21//11, total24//13]
print(pg_total_bud)
[14009384, 22722222, 20227272, 29507692]
Checking to see the different season each movie in the NC-17 rated category was realesed in. Based on the code below there are 8 movies that were realesed in 'Winter', 14 movies that were realesed in 'Autumn', 8 movies that were realesed in 'Spring' and 4 movies that were realesed in 'Summer'.
collections.Counter(season_nc17)
Counter({1: 8, 2: 8, 4: 14, 3: 4})
Getting the index of the NC-17 rated movies that were realesed in Winter, Summer, Autumn and Spring.
index_one = []
index_two = []
index_three = []
index_four = []
for x,i in enumerate(df_data.System):
if i == 'NC-17':
if df_data['Season'][x] ==1: index_one.append(x)
if df_data['Season'][x] ==2: index_two.append(x)
if df_data['Season'][x] ==3: index_three.append(x)
if df_data['Season'][x] ==4: index_four.append(x)
Getting the number of Tickets sold in the Winter by NC-17 rated Drama movies.
sum1 = []
for i in index_one:
sum1.append(int(df_data['No.Tickets'][i]))
total25 = sum(sum1)
total25
16479358
Getting the Average Net Profit Margin made in the Winter by NC-17 rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['NPM'][i]))
total26 = sum(sum1)
total26/8
57.875
Getting the Amount of Expenses spent in the Winter to produce NC-17 rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['Budget'][i]))
total27 = sum(sum1)
total27
58250000
Getting the number of Tickets sold in the Spring by NC-17 rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['No.Tickets'][i]))
total28 = sum(sum2)
total28
22453884
Getting the Average Net Profit Margin made in the Spring by NC-17 rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['NPM'][i]))
total29 = sum(sum2)
total29/8
62.875
Getting the Amount of Expenses spent in the Spring to produce NC-17 rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['Budget'][i]))
total30 = sum(sum2)
total30
33165116
Getting the number of Tickets sold in the Summer by NC-17 rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['No.Tickets'][i]))
total31 = sum(sum3)
total31
8314920
Getting the Average Net Profit Margin made in the Summer by NC-17 rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['NPM'][i]))
total32 = sum(sum3)
total32/4
71.25
Getting the Amount of Expenses spent in the Summer to produce NC-17 rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['Budget'][i]))
total33 = sum(sum3)
total33
37404765
Getting the number of Tickets sold in the Autumn by NC-17 rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['No.Tickets'][i]))
total34 = sum(sum4)
total34
48856406
Getting the Average Net Profit Margin made in the Autumn by NC-17 rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['NPM'][i]))
total35 = sum(sum4)
total35/14
72.5
Getting the Amount of Expenses spent in the Autumn to produce NC-17 rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['Budget'][i]))
total36 = sum(sum4)
total36
72404940
Putting all the Tickets that were sold in Winter, Spring, Summer and Autumn by NC-17 rated Drama Movies, in a list for the Javascript graph below.
nc_total_tick = [total25//8, total28//8, total31//4, total34//14]
print(nc_total_tick)
[2059919, 2806735, 2078730, 3489743]
Putting all the Net Profit Margin that was made in Winter, Spring, Summer and Autumn by NC-17 rated Drama Movies, in a list for the Javascript graph below.
nc_total_pro = [total26/8, total29/8, total32/4, total35/14]
print(nc_total_pro)
[57.875, 62.875, 71.25, 72.5]
Putting all the Expenses that was spent in Winter, Spring, Summer and Autumn by NC-17 rated Drama Movies, in a list for the Javascript graph below.
nc_total_bud = [total27//8, total30//8, total33//4, total36//14]
print(nc_total_bud)
[7281250, 4145639, 9351191, 5171781]
Checking to see the different season each movie in the PG-13 rated category was realesed in. Based on the code below there are 20 movies that were realesed in 'Winter', 20 movies that were realesed in 'Autumn', 14 movies that were realesed in 'Spring' and 10 movies that were realesed in 'Summer'.
collections.Counter(season_pg13)
Counter({4: 20, 1: 20, 3: 10, 2: 14})
Getting the index of the PG-13 rated movies that were realesed in Winter, Summer, Autumn and Spring.
index_one = []
index_two = []
index_three = []
index_four = []
for x,i in enumerate(df_data.System):
if i == 'PG-13':
if df_data['Season'][x] ==1: index_one.append(x)
if df_data['Season'][x] ==2: index_two.append(x)
if df_data['Season'][x] ==3: index_three.append(x)
if df_data['Season'][x] ==4: index_four.append(x)
Getting the number of Tickets sold in the Winter by PG-13 rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['No.Tickets'][i]))
total37 = sum(sum1)
total37
237229054
Getting the Average Net Profit Margin made in the Winter by PG-13 rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['NPM'][i]))
total38 = sum(sum1)
total38/20
68.45
Getting the Amount of Expenses spent in the Winter to produce PG-13 rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['Budget'][i]))
total39 = sum(sum1)
total39
499100000
Getting the number of Tickets sold in the Spring by PG-13 rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['No.Tickets'][i]))
total40 = sum(sum2)
total40
103122577
Getting the Average Net Profit Margin made in the Spring by PG-13 rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['NPM'][i]))
total41 = sum(sum2)
total41/14
65.0
Getting the Amount of Expenses spent in the Spring to produce PG-13 rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['Budget'][i]))
total42 = sum(sum2)
total42
271600000
Getting the number of Tickets sold in the Summer by PG-13 rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['No.Tickets'][i]))
total43 = sum(sum3)
total43
101386726
Getting the Average Net Profit Margin made in the Summer by PG-13 rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['NPM'][i]))
total44 = sum(sum3)
total44/10
72.3
Getting the Amount of Expenses spent in the Summer to produce PG-13 rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['Budget'][i]))
total45 = sum(sum3)
total45
190175000
Getting the number of Tickets sold in the Autumn by PG-13 rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['No.Tickets'][i]))
total46 = sum(sum4)
total46
226553982
Getting the Average Net Profit Margin made in the Autumn by PG-13 rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['NPM'][i]))
total47 = sum(sum4)
total47/20
65.05
Getting the Amount of Expenses spent in the Autumn to produce PG-13 rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['Budget'][i]))
total48 = sum(sum4)
total48
619650000
Putting all the Tickets that were sold in Winter, Spring, Summer and Autumn by PG-13 rated Drama Movies, in a list for the Javascript graph below.
pg13_total_tick = [total37//20, total40//14, total43//10, total46//20]
print(pg13_total_tick)
[11861452, 7365898, 10138672, 11327699]
Putting all the Net Profit Margin that was made in Winter, Spring, Summer and Autumn by PG-13 rated Drama Movies, in a list for the Javascript graph below.
pg13_total_pro = [total38/20, total41/14, total44/10, total47/20]
print(pg13_total_pro)
[68.45, 65.0, 72.3, 65.05]
Putting all the Expenses that was spent in Winter, Spring, Summer and Autumn by PG-13 rated Drama Movies, in a list for the Javascript graph below.
pg13_total_bud = [total39//20, total42//14, total45//10, total48//20]
print(pg13_total_bud)
[24955000, 19400000, 19017500, 30982500]
Checking to see the different season each movie in the G rated category was realesed in. Based on the code below there are 3 movies that were realesed in 'Winter', 7 movies that were realesed in 'Autumn', 18 movies that were realesed in 'Spring' and 7 movies that were realesed in 'Summer'.
collections.Counter(season_g)
Counter({2: 8, 4: 7, 3: 7, 1: 3})
Getting the index of the G rated movies that were realesed in Winter, Summer, Autumn and Spring.
index_one = []
index_two = []
index_three = []
index_four = []
for x,i in enumerate(df_data.System):
if i == 'G':
if df_data['Season'][x] ==1: index_one.append(x)
if df_data['Season'][x] ==2: index_two.append(x)
if df_data['Season'][x] ==3: index_three.append(x)
if df_data['Season'][x] ==4: index_four.append(x)
Getting the number of Tickets sold in the Winter by G-rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['No.Tickets'][i]))
total49 = sum(sum1)
total49
16707096
Getting the Average Net Profit Margin made in the Winter by G-rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['NPM'][i]))
total50 = sum(sum1)
total50/3
64.66666666666667
Getting the Amount of Expenses spent in the Winter to produce G-rated Drama movies.
sum1 =[]
for i in index_one:
sum1.append(int(df_data['Budget'][i]))
total51 = sum(sum1)
total51
77666000
Getting the number of Tickets sold in the Spring by G-rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['No.Tickets'][i]))
total52 = sum(sum2)
total52
82454882
Getting the Average Net Profit Margin made in the Spring by G-rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['NPM'][i]))
total53 = sum(sum2)
total53/8
75.25
Getting the Amount of Expenses spent in the Spring to produce G-rated Drama movies.
sum2 =[]
for i in index_two:
sum2.append(int(df_data['Budget'][i]))
total54 = sum(sum2)
total54
85100000
Getting the number of Tickets sold in the Summer by G-rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['No.Tickets'][i]))
total55 = sum(sum3)
total55
193789041
Getting the Average Net Profit Margin made in the Summer by G-rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['NPM'][i]))
total56 = sum(sum3)
total56/7
73.14285714285714
Getting the Amount of Expenses spent in the Summer to produce G-rated Drama movies.
sum3 =[]
for i in index_three:
sum3.append(int(df_data['Budget'][i]))
total57 = sum(sum3)
total57
193858000
Getting the number of Tickets sold in the Autumn by G-rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['No.Tickets'][i]))
total58 = sum(sum4)
total58
74232412
Getting the Average Net Profit Margin made in the Autumn by G-rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['NPM'][i]))
total59 = sum(sum4)
total59/7
74.71428571428571
Getting the Amount of Expenses spent in the Autumn to produce G-rated Drama movies.
sum4 =[]
for i in index_four:
sum4.append(int(df_data['Budget'][i]))
total60 = sum(sum4)
total60
135850000
Putting all the Tickets that were sold in Winter, Spring, Summer and Autumn by G-rated Drama Movies, in a list for the Javascript graph below.
g_total_tick = [total49//3, total52//8, total55//7, total58//7]
print(g_total_tick)
[5569032, 10306860, 27684148, 10604630]
Putting all the Net Profit Margin that was made in Winter, Spring, Summer and Autumn by G-rated Drama Movies, in a list for the Javascript graph below.
g_total_pro = [total50/3, total53/8, total56/7, total59/7]
print(g_total_pro)
[64.66666666666667, 75.25, 73.14285714285714, 74.71428571428571]
Putting all the Expenses that was spent in Winter, Spring, Summer and Autumn by G-rated Drama Movies, in a list for the Javascript graph below.
g_total_bud = [total51//3, total54//8, total57//7, total60//7]
print(g_total_bud)
[25888666, 10637500, 27694000, 19407142]
This is the HTML Script from Highcharts Libaray that will be used to visualize the Average Number of Tickets Sold the Average Expenses and the Average Net Profit Margin within every Winter, Spring, Summer and Autumn. This will be on the Five Sysetem Rating; R, PG, PG-13, NC-17 and G rating of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, to see what season is the best time to realease movies per system rating and more. The visualisation that will be used is a 'Colunm Series Chart with positive and negative numbers'. This will be done using Javascript and HTML below.
%%html
<script src="https://code.highcharts.com/highcharts.js" ></script>
<script src="https://cloud.highcharts.com/embed"></script>
<script src="https://code.highcharts.com/modules/data.js" ></script>
<script src="https://code.highcharts.com/modules/exporting.js" ></script>
<script src="https://code.highcharts.com/modules/export-data.js" ></script>
<script src="https://code.highcharts.com/modules/accessibility.js" ></script>
<figure class="highcharts-figure">
<table>
<td><div id='v2' ></div><td>
<td><div id='v3' ></div><td>
<td><div id='v4' ></div><td>
<td><div id='v5' ></div><td>
<td><div id='v6' ></div><td>
</table>
</figure>
%%js inline
(function (H) {
H.addEvent(H.Axis, 'afterInit', function () {
const logarithmic = this.logarithmic;
if (logarithmic && this.options.custom.allowNegativeLog) {
// Avoid errors on negative numbers on a log axis
this.positiveValuesOnly = false;
// Override the converter functions
logarithmic.log2lin = num => {
const isNegative = num < 0;
let adjustedNum = Math.abs(num);
if (adjustedNum < 10) {
adjustedNum += (10 - adjustedNum) / 10;
}
const result = Math.log(adjustedNum) / Math.LN10;
return isNegative ? -result : result;
};
logarithmic.lin2log = num => {
const isNegative = num < 0;
let result = Math.pow(10, Math.abs(num));
if (result < 10) {
result = (10 * (result - 1)) / (10 - 1);
}
return isNegative ? -result : result;
};
}
});
}(Highcharts));
Highcharts.chart('v2', {
chart: {
height: 500,
width: 500,
type: 'bar',
},
title: {
text: 'System Rating R'
},
subtitle: {
text: 'Drama Movies'
},
xAxis: {
categories: ['Winter', 'Spring', 'Summer', 'Autumn']
},
yAxis: {
type: 'logarithmic',
custom: {
allowNegativeLog: true
},
},
plotOptions: {
bar: {
dataLabels: {
enabled: true,
}
},
series: {
stacking: 'normal',
dataLabels: {
enabled: true,
style: {
textOutline: false ,
fontWeight: 'bold'
}
}
}
},
legend: {
enabled: true,
verticalAlign: 'top',
symbolRadius: 3,
reversed: true
},
credits: {
enabled: true
},
tooltip:{
shared:true,
formatter: function () {
var txt = '<span style="font-size: 10px">' + this.x + '</span><br/>',
point;
for(var i = this.points.length; i >= 0; i--) {
point = this.points[i];
if (point) {
txt += '<span style="color:' + point.color + '">●</span> ' + point.series.name + ': <b>' + point.y + '</b><br/>';
}
}
return txt;
}
},
series: [{
name: 'Avg Net Profit Margin',
//data: [73, 51, 77, 59],
color: '#FFC300',
data:[{
name:'Avg Net Profit Margin',
y:73,
color: '#FFC300',
},{
name:'Avg Net Profit Margin',
y:51,
color: '#FFC300',
},{
name:'Avg Net Profit Margin',
y:77,
color: '#FFC300',
},{
name:'Avg Net Profit Margin',
y:59,
color: '#FFC300',
}],
tooltip: {
valuePrefix: '%',
color: '#581845'
},
stack: 'female'
},{
name: 'Avg Expenses',
//data:[-25550000, -11421364, -8300000, -15651570],
color: '#581845',
data:[{
//name:'System Rating: R',
y:-25550000,
color: '#581845',
},{
//name:'System Rating: R',
y:-11421364,
color: '#581845',
},{
//name:'System Rating: R',
y:-8300000,
color: '#581845',
},{
//name:'System Rating: R',
y:-15651570,
color: '#581845',
}],
tooltip: {
valuePrefix: '-$'
},
stack: 'male'
}, {
name: 'Avg No. Tickets Sold',
//data: [17186733, 2732688, 3396913, 5210935],
color: '#C70039',
data:[{
//name:'System Rating: R',
y:17186733,
color: '#C70039',
},{
//name:'System Rating: R',
y:2732688,
color: '#C70039',
},{
//name:'System Rating: R',
y:3396913,
color: '#C70039',
},{
//name:'System Rating: R',
y:5210935,
color: '#C70039',
}],
tooltip: {
valuePrefix: '$'
},
stack: 'male'
}]
});
<link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" />
Cell In[126], line 1 <link href="https://cdn.webdatarocks.com/latest/webdatarocks.min.css" rel="stylesheet" /> ^ SyntaxError: invalid syntax
%%js
(function (H) {
H.addEvent(H.Axis, 'afterInit', function () {
const logarithmic = this.logarithmic;
if (logarithmic && this.options.custom.allowNegativeLog) {
// Avoid errors on negative numbers on a log axis
this.positiveValuesOnly = false;
// Override the converter functions
logarithmic.log2lin = num => {
const isNegative = num < 0;
let adjustedNum = Math.abs(num);
if (adjustedNum < 10) {
adjustedNum += (10 - adjustedNum) / 10;
}
const result = Math.log(adjustedNum) / Math.LN10;
return isNegative ? -result : result;
};
logarithmic.lin2log = num => {
const isNegative = num < 0;
let result = Math.pow(10, Math.abs(num));
if (result < 10) {
result = (10 * (result - 1)) / (10 - 1);
}
return isNegative ? -result : result;
};
}
});
}(Highcharts));
Highcharts.chart('v3', {
chart: {
height: 500,
width: 500,
type: 'bar',
},
title: {
text: 'System Rating PG'
},
subtitle: {
text: 'Drama Movies'
},
xAxis: {
categories: ['Winter', 'Spring', 'Summer', 'Autumn']
},
yAxis: {
type: 'logarithmic',
custom: {
allowNegativeLog: true
}
},
plotOptions: {
bar: {
dataLabels: {
enabled: true,
}
},
series: {
stacking: 'normal',
dataLabels: {
enabled: true,
style: {
textOutline: false ,
fontWeight: 'bold'
}
}
}
},
legend: {
enabled: true,
verticalAlign: 'top',
symbolRadius: 3,
reversed: true
},
credits: {
enabled: true
},
tooltip:{
shared:true,
formatter: function () {
var txt = '<span style="font-size: 10px">' + this.x + '</span><br/>',
point;
for(var i = this.points.length; i >= 0; i--) {
point = this.points[i];
if (point) {
txt += '<span style="color:' + point.color + '">●</span> ' + point.series.name + ': <b>' + point.y + '</b><br/>';
}
}
return txt;
}
},
series: [{
name: 'Avg Net Profit Margin',
color: '#FFC300',
//data: [80, 66, 75, 60],
data:[{
name:'Avg Net Profit Margin',
y:80,
color: '#FFC300',
},{
name:'Avg Net Profit Margin',
y:66,
color: '#FFC300',
},{
name:'Avg Net Profit Margin',
y:75,
color: '#FFC300',
},{
name:'Avg Net Profit Margin',
y:60,
color: '#FFC300',
}],
tooltip: {
valuePrefix: '%'
},
stack: 'female'
},{
name: 'Avg Expenses',
color: '#581845',
//data: [-14009384, -22722222, -20227272, -29507692],
data:[{
//name:'System Rating: R',
y:-14009384,
color: '#581845',
},{
//name:'System Rating: R',
y:-22722222,
color: '#581845',
},{
//name:'System Rating: R',
y:-20227272,
color: '#581845',
},{
//name:'System Rating: R',
y:-29507692,
color: '#581845',
}],
stack: 'male'
}, {
name: 'Avg No. Tickets Sold',
color: '#C70039',
//data: [8398544, 11145717, 11981547, 10249162],
data:[{
//name:'System Rating: R',
y:8398544,
color: '#C70039',
},{
//name:'System Rating: R',
y:11145717,
color: '#C70039',
},{
//name:'System Rating: R',
y:11981547,
color: '#C70039',
},{
//name:'System Rating: R',
y:10249162,
color: '#C70039',
}],
stack: 'male'
}]
});
%%js
(function (H) {
H.addEvent(H.Axis, 'afterInit', function () {
const logarithmic = this.logarithmic;
if (logarithmic && this.options.custom.allowNegativeLog) {
// Avoid errors on negative numbers on a log axis
this.positiveValuesOnly = false;
// Override the converter functions
logarithmic.log2lin = num => {
const isNegative = num < 0;
let adjustedNum = Math.abs(num);
if (adjustedNum < 10) {
adjustedNum += (10 - adjustedNum) / 10;
}
const result = Math.log(adjustedNum) / Math.LN10;
return isNegative ? -result : result;
};
logarithmic.lin2log = num => {
const isNegative = num < 0;
let result = Math.pow(10, Math.abs(num));
if (result < 10) {
result = (10 * (result - 1)) / (10 - 1);
}
return isNegative ? -result : result;
};
}
});
}(Highcharts));
Highcharts.chart('v4', {
chart: {
height: 500,
width: 500,
type: 'bar',
},
title: {
text: 'System Rating NC-17'
},
subtitle: {
text: 'Drama Movies'
},
xAxis: {
categories: ['Winter', 'Spring', 'Summer', 'Autumn']
},
yAxis: {
type: 'logarithmic',
custom: {
allowNegativeLog: true
}
},
plotOptions: {
bar: {
dataLabels: {
enabled: true,
}
},
series: {
stacking: 'normal',
dataLabels: {
enabled: true,
style: {
textOutline: false ,
fontWeight: 'bold'
}
}
}
},
legend: {
enabled: true,
verticalAlign: 'top',
symbolRadius: 3,
reversed: true
},
credits: {
enabled: true
},
tooltip:{
shared:true,
formatter: function () {
var txt = '<span style="font-size: 10px">' + this.x + '</span><br/>',
point;
for(var i = this.points.length; i >= 0; i--) {
point = this.points[i];
if (point) {
txt += '<span style="color:' + point.color + '">●</span> ' + point.series.name + ': <b>' + point.y + '</b><br/>';
}
}
return txt;
}
},
series: [{
name: 'Avg Net Profit Margin',
color: '#FFC300',
//data: [56, 63, 71, 73],
data:[{
//name:'System Rating: R',
y:56,
color: '#FFC300',
},{
//name:'System Rating: R',
y:63,
color: '#FFC300',
},{
//name:'System Rating: R',
y:71,
color: '#FFC300',
},{
//name:'System Rating: R',
y:73,
color: '#FFC300',
}],
tooltip: {
valuePrefix: '%'
},
stack: 'female'
},{
name: 'Avg Expenses',
color: '#581845',
//data: [-24955000, -19400000, -19017500, -30982500],
data:[{
//name:'System Rating: R',
y:-24955000,
color: '#581845',
},{
//name:'System Rating: R',
y:-19400000,
color: '#581845',
},{
//name:'System Rating: R',
y:-19017500,
color: '#581845',
},{
//name:'System Rating: R',
y:-30982500,
color: '#581845',
}],
stack: 'male'
}, {
name: 'Avg No. Tickets Sold',
color: '#C70039',
//data: [2059919, 2806735, 2078730, 3489743],
data:[{
//name:'System Rating: R',
y:2059919,
color: '#C70039',
},{
//name:'System Rating: R',
y:2806735,
color: '#C70039',
},{
//name:'System Rating: R',
y:2078730,
color: '#C70039',
},{
//name:'System Rating: R',
y:3489743,
color: '#C70039',
}],
stack: 'male'
}]
});
%%js
(function (H) {
H.addEvent(H.Axis, 'afterInit', function () {
const logarithmic = this.logarithmic;
if (logarithmic && this.options.custom.allowNegativeLog) {
// Avoid errors on negative numbers on a log axis
this.positiveValuesOnly = false;
// Override the converter functions
logarithmic.log2lin = num => {
const isNegative = num < 0;
let adjustedNum = Math.abs(num);
if (adjustedNum < 10) {
adjustedNum += (10 - adjustedNum) / 10;
}
const result = Math.log(adjustedNum) / Math.LN10;
return isNegative ? -result : result;
};
logarithmic.lin2log = num => {
const isNegative = num < 0;
let result = Math.pow(10, Math.abs(num));
if (result < 10) {
result = (10 * (result - 1)) / (10 - 1);
}
return isNegative ? -result : result;
};
}
});
}(Highcharts));
Highcharts.chart('v5', {
chart: {
height: 500,
width: 500,
type: 'bar',
},
title: {
text: 'System Rating PG-13'
},
subtitle: {
text: 'Drama Movies'
},
xAxis: {
categories: ['Winter', 'Spring', 'Summer', 'Autumn'],
colors: ['#2f7ed8', '#0d233a', '#8bbc21', '#910000']
},
yAxis: {
type: 'logarithmic',
custom: {
allowNegativeLog: true
}
},
plotOptions: {
bar: {
dataLabels: {
enabled: true,
}
},
series: {
stacking: 'normal',
dataLabels: {
enabled: true,
style: {
textOutline: false ,
fontWeight: 'bold'
}
}
}
},
legend: {
enabled: true,
verticalAlign: 'top',
symbolRadius: 3,
reversed: true
},
credits: {
enabled: true
},
tooltip:{
shared:true,
formatter: function () {
var txt = '<span style="font-size: 10px">' + this.x + '</span><br/>',
point;
for(var i = this.points.length; i >= 0; i--) {
point = this.points[i];
if (point) {
txt += '<span style="color:' + point.color + '">●</span> ' + point.series.name + ': <b>' + point.y + '</b><br/>';
}
}
return txt;
}
},
series: [{
name: 'Avg Net Profit Margin',
color: '#FFC300',
//data: [69, 65, 72, 65],
data:[{
//name:'System Rating: R',
y:69,
color: '#FFC300',
},{
//name:'System Rating: R',
y:65,
color: '#FFC300',
},{
//name:'System Rating: R',
y:72,
color: '#FFC300',
},{
//name:'System Rating: R',
y:65,
color: '#FFC300',
}],
tooltip: {
valuePrefix: '%'
},
stack: 'female'
},{
name: 'Avg Expenses',
color: '#581845',
//data: [-24955000, -19400000, -19017500, -30982500],
data:[{
//name:'System Rating: R',
y:-24955000,
color: '#581845',
},{
//name:'System Rating: R',
y:-19400000,
color: '#581845',
},{
//name:'System Rating: R',
y:-19017500,
color: '#581845',
},{
//name:'System Rating: R',
y:-30982500,
color: '#581845',
}],
stack: 'male'
}, {
name: 'Avg No. Tickets Sold',
color: '#C70039',
//data: [11861452, 7365898, 10138672, 11327699],
data:[{
//name:'System Rating: R',
y:11861452,
color: '#C70039',
},{
//name:'System Rating: R',
y:7365898,
color: '#C70039',
},{
//name:'System Rating: R',
y:10138672,
color: '#C70039',
},{
//name:'System Rating: R',
y:11327699,
color: '#C70039',
}],
stack: 'male'
}]
});
%%js
(function (H) {
H.addEvent(H.Axis, 'afterInit', function () {
const logarithmic = this.logarithmic;
if (logarithmic && this.options.custom.allowNegativeLog) {
// Avoid errors on negative numbers on a log axis
this.positiveValuesOnly = false;
// Override the converter functions
logarithmic.log2lin = num => {
const isNegative = num < 0;
let adjustedNum = Math.abs(num);
if (adjustedNum < 10) {
adjustedNum += (10 - adjustedNum) / 10;
}
const result = Math.log(adjustedNum) / Math.LN10;
return isNegative ? -result : result;
};
logarithmic.lin2log = num => {
const isNegative = num < 0;
let result = Math.pow(10, Math.abs(num));
if (result < 10) {
result = (10 * (result - 1)) / (10 - 1);
}
return isNegative ? -result : result;
};
}
});
}(Highcharts));
Highcharts.chart('v6', {
chart: {
height: 500,
width: 500,
type: 'bar',
colors: []
},
title: {
text: 'System Rating G'
},
subtitle: {
text: 'Drama Movies'
},
xAxis: {
categories: ['Winter', 'Spring', 'Summer', 'Autumn']
},
yAxis: {
type: 'logarithmic',
custom: {
allowNegativeLog: true
}
},
plotOptions: {
bar: {
dataLabels: {
enabled: true,
}
},
series: {
stacking: 'normal',
dataLabels: {
enabled: true,
style: {
textOutline: false ,
fontWeight: 'bold'
}
}
}
},
legend: {
enabled: true,
verticalAlign: 'top',
symbolRadius: 3,
reversed: true
},
credits: {
enabled: true
},
tooltip:{
shared:true,
formatter: function () {
var txt = '<span style="font-size: 10px">' + this.x + '</span><br/>',
point;
for(var i = this.points.length; i >= 0; i--) {
point = this.points[i];
if (point) {
txt += '<span style="color:' + '">●</span> ' + point.series.name + ': <b>' + point.y + '</b><br/>';
}
}
return txt;
}
},
series: [{
name: 'Avg Net Profit Margin',
color: '#FFC300',
//data: [65, 75, 73, 75] ,
data:[{
//name:'System Rating: R',
y:65,
color: '#FFC300',
},{
//name:'System Rating: R',
y:75,
color: '#FFC300',
},{
//name:'System Rating: R',
y:73,
color: '#FFC300',
},{
//name:'System Rating: R',
y:75,
color: '#FFC300',
}],
tooltip: {
valuePrefix: '%'
},
stack: 'female'
},{
name: 'Avg Expenses',
color: '#581845',
//data: [-25888666, -10637500, -27694000, -19407142],
data:[{
//name:'System Rating: R',
y:-25888666,
color: '#581845',
},{
//name:'System Rating: R',
y:-10637500,
color: '#581845',
},{
//name:'System Rating: R',
y:-27694000,
color: '#581845',
},{
//name:'System Rating: R',
y:-19407142,
color: '#581845',
}],
stack: 'male'
}, {
name: 'Avg No. Tickets Sold',
color: '#C70039',
//data: [5569032, 10306860, 27684148, 10604630],
data:[{
//name:'System Rating: R',
y:5569032,
color: '#C70039',
},{
//name:'System Rating: R',
y:10306860,
color: '#C70039',
},{
//name:'System Rating: R',
y:27684148,
color: '#C70039',
},{
//name:'System Rating: R',
y:10604630,
color: '#C70039',
}],
stack: 'male'
}]
});
This is the HTML Script from Highcharts Libaray to visualize the Total Number of Tickets Sold in each System Rating: 'R, PG, PG-13, NC-17 and G' of all the movies in the Drama Genre within the 'Drama_DataFrame' dataframe, this will be done using a 'Ring Chart and Pie Chart infused'. This will be done using Javascript and HTML below.
%%html
<script type="text/javascript" src="js/script.js"></script>
<script src="https://code.highcharts.com/highcharts.js"></script>
<script src="https://code.highcharts.com/modules/series-label.js"></script>
<script src="https://code.highcharts.com/modules/exporting.js"></script>
<script src="https://code.highcharts.com/modules/export-data.js"></script>
<script src="https://code.highcharts.com/modules/accessibility.js"></script>
<div id='harph' style="height:700"></div>
%%js %%html inline
var chart = new Highcharts.Chart({
chart:{
renderTo:'harph',
height:593,
width:1000
},
title:{
text:'Number of Tickets Sold'
},
subtitle:{
text:'Drama Genre'
},
series:[{
type:'pie',
name: 'Percentage of Tickets Sold',
tooltip:{
valueSuffix:'%'
},
shadow: true,
data:[{
name:'System Rating: R',
y:21,
sliced:true,
selected:true,
color: '#581845',
},{
name:'System Rating: PG-13',
y:33,
sliced:true,
color:'#900C3F',
},{
name:'System Rating: PG',
y:24,
sliced:true,
color:'#C70039',
},{
name:'System Rating: NC-17',
y:5,
sliced:true,
color:'#FF5733',
},{
name:'System Rating: G',
y:18,
sliced:true,
color:'#FFAA00',
}],
innerSize:'70%',
size:'97%'
},{
type:'pie',
shadow: true,
dataLabels: false,
name: 'No. of Tickets Sold',
data:[{
name: ' Fifty Shades of Grey ' ,
y: 0.126950353271215 ,
color:"#581845",
},{
name: ' Django Unchained ' ,
y: 0.1000372823968936 ,
color:"#581845",
},{
name: ' Fifty Shades Darker ' ,
y: 0.08479655719100986 ,
color:"#581845",
},{
name: ' Fifty Shades Freed ' ,
y: 0.08256260817064842 ,
color:"#581845",
},{
name: ' Gone Girl ' ,
y: 0.08194376649358207 ,
color:"#581845",
},{
name: ' Black Swan ' ,
y: 0.07365072819242854 ,
color:"#581845",
},{
name: ' Flight ' ,
y: 0.035697055171768834 ,
color:"#581845",
},{
name: ' The Wolfman ' ,
y: 0.03171198361362964 ,
color:"#581845",
},{
name: ' Zero Dark Thirty ' ,
y: 0.02992846528333498 ,
color:"#581845",
},{
name: ' Priest ' ,
y: 0.018710016439102733 ,
color:"#581845",
},{
name: ' The Ides of March ' ,
y: 0.017283074157099485 ,
color:"#581845",
},{
name: ' Manchester by the Sea ' ,
y: 0.01728261837935836 ,
color:"#581845",
},{
name: ' Fame ' ,
y: 0.0171665551334068 ,
color:"#581845",
},{
name: ' Crimson Peak ' ,
y: 0.016667425147527084 ,
color:"#581845",
},{
name: ' Hereditary ' ,
y: 0.015592912448024023 ,
color:"#581845",
},{
name: ' Boyhood ' ,
y: 0.01273355188120584 ,
color:"#581845",
},{
name: ' Quartet ' ,
y: 0.012490297742501055 ,
color:"#581845",
},{
name: ' Ordinary People ' ,
y: 0.012176362481024666 ,
color:"#581845",
},{
name: ' Downsizing ' ,
y: 0.012108785093504838 ,
color:"#581845",
},{
name: ' The Master ' ,
y: 0.011260471551964184 ,
color:"#581845",
},{
name: ' The Debt ' ,
y: 0.010361506651894932 ,
color:"#581845",
},{
name: ' Carol ' ,
y: 0.009525425740264925 ,
color:"#581845",
},{
name: ' The Witch ' ,
y: 0.008994277923897528 ,
color:"#581845",
},{
name: ' Whiplash ' ,
y: 0.0086640102561464 ,
color:"#581845",
},{
name: ' Ex Machina ' ,
y: 0.008528244071941818 ,
color:"#581845",
},{
name: ' For Colored Girls ' ,
y: 0.008452536054181487 ,
color:"#581845",
},{
name: ' Room ' ,
y: 0.008062325831900929 ,
color:"#581845",
},{
name: ' Arbitrage ' ,
y: 0.00796626344721367 ,
color:"#581845",
},{
name: ' Endless Love ' ,
y: 0.007718911755450848 ,
color:"#581845",
},{
name: ' Nocturnal Animals ' ,
y: 0.007203218139466748 ,
color:"#581845",
},{
name: ' The Water Diviner ' ,
y: 0.006904416922301841 ,
color:"#581845",
},{
name: ' Let Me In ' ,
y: 0.006285375147690608 ,
color:"#581845",
},{
name: ' Biutiful ' ,
y: 0.005488791268114878 ,
color:"#581845",
},{
name: ' Before Midnight ' ,
y: 0.005169615674268552 ,
color:"#581845",
},{
name: ' Melancholia ' ,
y: 0.004850653517803438 ,
color:"#581845",
},{
name: ' Buried ' ,
y: 0.00472903645332829 ,
color:"#581845",
},{
name: ' Margin Call ' ,
y: 0.004542932396748761 ,
color:"#581845",
},{
name: ' If Beale Street Could Talk ' ,
y: 0.004415301289396786 ,
color:"#581845",
},{
name: ' Mommy ' ,
y: 0.0038987894967847116 ,
color:"#581845",
},{
name: ' Addicted ' ,
y: 0.00389061662372918 ,
color:"#581845",
},{
name: ' Silent House ' ,
y: 0.0036930803274185455 ,
color:"#581845",
},{
name: ' Blue Valentine ' ,
y: 0.0036831821688648927 ,
color:"#581845",
},{
name: ' Winter\'s Bone ' ,
y: 0.0035865372779914128 ,
color:"#581845",
},{
name: ' Unsane ' ,
y: 0.0031670839111789142 ,
color:"#581845",
},{
name: ' Rich and Famous ' ,
y: 0.0028902978705634837 ,
color:"#581845",
},{
name: ' Stoker ' ,
y: 0.0026757288265710135 ,
color:"#581845",
},{
name: ' Chloe ' ,
y: 0.0026304222957969038 ,
color:"#581845",
},{
name: ' The Florida Project ' ,
y: 0.0025112953341025483 ,
color:"#581845",
},{
name: ' Never Let Me Go ' ,
y: 0.0024842599324825083 ,
color:"#581845",
},{
name: ' Raggedy Man ' ,
y: 0.002445636659707563 ,
color:"#581845",
},{
name: ' We Need to Talk About Kevin ' ,
y: 0.0023934512200015122 ,
color:"#581845",
},{
name: ' We Are Your Friends ' ,
y: 0.0022574160157643607 ,
color:"#581845",
},{
name: ' Pennies from Heaven ' ,
y: 0.002039058458255398 ,
color:"#581845",
},{
name: ' The Homesman ' ,
y: 0.0018270173132466435 ,
color:"#581845",
},{
name: ' Miss Sloane ' ,
y: 0.0017163100115798451 ,
color:"#581845",
},{
name: ' The Immigrant ' ,
y: 0.001686379865477133 ,
color:"#581845",
},{
name: ' Tulip Fever ' ,
y: 0.0015102406666328858 ,
color:"#581845",
},{
name: ' Knock Knock ' ,
y: 0.0014070236830629552 ,
color:"#581845",
},{
name: ' Martha Marcy May Marlene ' ,
y: 0.0012092361531681874 ,
color:"#581845",
},{
name: ' Take Shelter ' ,
y: 0.0011054322167999271 ,
color:"#581845",
},{
name: ' Stone ' ,
y: 0.000903778357676767 ,
color:"#581845",
},{
name: ' By the Sea ' ,
y: 0.0008287929143840789 ,
color:"#581845",
},{
name: ' Zoot Suit ' ,
y: 0.000723926237721873 ,
color:"#581845",
},{
name: ' Everything Must Go ' ,
y: 0.0006271968612183303 ,
color:"#581845",
},{
name: ' A Ghost Story ' ,
y: 0.0006158068643022558 ,
color:"#581845",
},{
name: ' The Hand ' ,
y: 0.000544171943233367 ,
color:"#581845",
},{
name: ' Coriolanus ' ,
y: 0.0004845962342028908 ,
color:"#581845",
},{
name: ' Locke ' ,
y: 0.000464313013069698 ,
color:"#581845",
},{
name: ' Ghost Story ' ,
y: 0.0004339181960016415 ,
color:"#581845",
},{
name: ' Palo Alto ' ,
y: 0.00025708310236240476 ,
color:"#581845",
},{
name: ' I Origins ' ,
y: 0.00018951460806679334 ,
color:"#581845",
},{
name: ' Stake Land ' ,
y: 0.00015106919977619045 ,
color:"#581845",
},{
name: ' One from the Heart ' ,
y: 0.0001415801295365251 ,
color:"#581845",
},{
name: ' The Reluctant Fundamentalist ' ,
y: 0.00011755286100792544 ,
color:"#581845",
},{
name: ' Sound of My Voice ' ,
y: 9.547987850103754e-05 ,
color:"#581845",
},{
name: ' Hesher ' ,
y: 8.514150534863738e-05 ,
color:"#581845",
},{
name: ' The Canyons ' ,
y: 1.3868983166596162e-05 ,
color:"#581845",
},
{
name: ' Gravity ' ,
y: 0.10042430006814071 ,
sliced:true,
color:"#900C3F",
},{
name: ' Sing ' ,
y: 0.09184777334318545 ,
sliced:true,
color:"#900C3F",
},{
name: ' A Quiet Place ' ,
y: 0.04842760737950036 ,
sliced:true,
color:"#900C3F",
},{
name: ' True Grit ' ,
y: 0.03652123784321127 ,
sliced:true,
color:"#900C3F",
},{
name: ' Creed II ' ,
y: 0.030920888022590957 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Help ' ,
y: 0.0308526277418438 ,
sliced:true,
color:"#900C3F",
},{
name: ' Me Before You ' ,
y: 0.03014981553669598 ,
sliced:true,
color:"#900C3F",
},{
name: ' Arrival ' ,
y: 0.029406105359216384 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Vow ' ,
y: 0.02860848125707642 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Post ' ,
y: 0.026021608866616704 ,
sliced:true,
color:"#900C3F",
},{
name: ' Creed ' ,
y: 0.025126763953606857 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Impossible ' ,
y: 0.024551032089162032 ,
sliced:true,
color:"#900C3F",
},{
name: ' Step Up Revolution ' ,
y: 0.023966418802457628 ,
sliced:true,
color:"#900C3F",
},{
name: ' Bridge of Spies ' ,
y: 0.023524309274997962 ,
sliced:true,
color:"#900C3F",
},{
name: ' Lights Out ' ,
y: 0.02154219152868317 ,
sliced:true,
color:"#900C3F",
},{
name: ' Dear John ' ,
y: 0.020561688301883676 ,
sliced:true,
color:"#900C3F",
},{
name: ' Contagion ' ,
y: 0.019912856613967363 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Woman in Black ' ,
y: 0.01866848901001518 ,
sliced:true,
color:"#900C3F",
},{
name: ' Water for Elephants ' ,
y: 0.01691012954105202 ,
sliced:true,
color:"#900C3F",
},{
name: ' Hereafter ' ,
y: 0.0157303625217635 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Rite ' ,
y: 0.014063191445323746 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Lucky One ' ,
y: 0.013989337388600871 ,
sliced:true,
color:"#900C3F",
},{
name: ' Safe Haven ' ,
y: 0.013615422996981813 ,
sliced:true,
color:"#900C3F",
},{
name: ' Burlesque ' ,
y: 0.01310899083645976 ,
sliced:true,
color:"#900C3F",
},{
name: ' Collateral Beauty ' ,
y: 0.012349894879717762 ,
sliced:true,
color:"#900C3F",
},{
name: ' Rings ' ,
y: 0.012003641015419623 ,
sliced:true,
color:"#900C3F",
},{
name: ' Ouija: Origin of Evil ' ,
y: 0.011846510053157636 ,
sliced:true,
color:"#900C3F",
},{
name: ' If I Stay ' ,
y: 0.011343345271615188 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Book Thief ' ,
y: 0.011014803583575563 ,
sliced:true,
color:"#900C3F",
},{
name: ' Anna Karenina ' ,
y: 0.010279088857626476 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Age of Adaline ' ,
y: 0.009986647581467405 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Giver ' ,
y: 0.009632789135078054 ,
sliced:true,
color:"#900C3F",
},{
name: ' Fences ' ,
y: 0.009306004911850674 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Longest Ride ' ,
y: 0.009236524249854158 ,
sliced:true,
color:"#900C3F",
},{
name: ' Brooklyn ' ,
y: 0.008986542976119461 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Tree of Life ' ,
y: 0.008935250772031564 ,
sliced:true,
color:"#900C3F",
},{
name: ' Everything, Everything ' ,
y: 0.008918068441012986 ,
sliced:true,
color:"#900C3F",
},{
name: ' One Day ' ,
y: 0.008565641734903134 ,
sliced:true,
color:"#900C3F",
},{
name: ' Remember Me ' ,
y: 0.008180190904166454 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Roommate ' ,
y: 0.007606856372282654 ,
sliced:true,
color:"#900C3F",
},{
name: ' Charlie St. Cloud ' ,
y: 0.0070179999806649915 ,
sliced:true,
color:"#900C3F",
},{
name: ' Trouble with the Curve ' ,
y: 0.006922574273886692 ,
sliced:true,
color:"#900C3F",
},{
name: ' Still Alice ' ,
y: 0.006036704881334775 ,
sliced:true,
color:"#900C3F",
},{
name: ' Dream House ' ,
y: 0.006028389495929878 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Best of Me ' ,
y: 0.005944026841948274 ,
sliced:true,
color:"#900C3F",
},{
name: ' Beastly ' ,
y: 0.005505212199095423 ,
sliced:true,
color:"#900C3F",
},{
name: ' Gifted ' ,
y: 0.005351242936297972 ,
sliced:true,
color:"#900C3F",
},{
name: ' Amour ' ,
y: 0.005325529517850589 ,
sliced:true,
color:"#900C3F",
},{
name: ' Courageous ' ,
y: 0.005093735254359923 ,
sliced:true,
color:"#900C3F",
},{
name: ' Suffragette ' ,
y: 0.004928561067635958 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Perks of Being a Wallflower ' ,
y: 0.0047873254625720495 ,
sliced:true,
color:"#900C3F",
},{
name: ' Project Almanac ' ,
y: 0.004764183096436486 ,
sliced:true,
color:"#900C3F",
},{
name: ' Mud ' ,
y: 0.004568389355969665 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Bye Bye Man ' ,
y: 0.0045149372363135045 ,
sliced:true,
color:"#900C3F",
},{
name: ' Victor Frankenstein ' ,
y: 0.004505764833471335 ,
sliced:true,
color:"#900C3F",
},{
name: ' Draft Day ' ,
y: 0.00432091398964024 ,
sliced:true,
color:"#900C3F",
},{
name: ' Upside Down ' ,
y: 0.0038199583442621154 ,
sliced:true,
color:"#900C3F",
},{
name: ' Wish Upon ' ,
y: 0.0033987313785130402 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Light Between Oceans ' ,
y: 0.003225647152469375 ,
sliced:true,
color:"#900C3F",
},{
name: ' Black or White ' ,
y: 0.0031806667659938295 ,
sliced:true,
color:"#900C3F",
},{
name: ' Country Strong ' ,
y: 0.002982477140630577 ,
sliced:true,
color:"#900C3F",
},{
name: ' Before I Fall ' ,
y: 0.0027426990069261224 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Space Between Us ' ,
y: 0.002385953917344334 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Words ' ,
y: 0.0023697849515387473 ,
sliced:true,
color:"#900C3F",
},{
name: ' Paranoia ' ,
y: 0.0023655954102153195 ,
sliced:true,
color:"#900C3F",
},{
name: ' Anonymous ' ,
y: 0.0022895553799615618 ,
sliced:true,
color:"#900C3F",
},{
name: ' Ida ' ,
y: 0.002214689405690285 ,
sliced:true,
color:"#900C3F",
},{
name: ' Labor Day ' ,
y: 0.0020542085475670634 ,
sliced:true,
color:"#900C3F",
},{
name: ' Midnight Special ' ,
y: 0.001111842596726238 ,
sliced:true,
color:"#900C3F",
},{
name: ' Rabbit Hole ' ,
y: 0.0008982802210818929 ,
sliced:true,
color:"#900C3F",
},{
name: ' Mustang ' ,
y: 0.0008038273449080661 ,
sliced:true,
color:"#900C3F",
},{
name: ' The Beaver ' ,
y: 0.0007304973427667674 ,
sliced:true,
color:"#900C3F",
},{
name: ' Like Crazy ' ,
y: 0.0005397472657314678 ,
sliced:true,
color:"#900C3F",
},{
name: ' Another Earth ' ,
y: 0.0003044120146536895 ,
sliced:true,
color:"#900C3F",
},{
name: ' Anna ' ,
y: 0.00017371975079867003 ,
sliced:true,
color:"#900C3F",
},{
name: ' Maggie ' ,
y: 0.00014878517590070093 ,
sliced:true,
color:"#900C3F",
}
,{
name: ' Tex ' ,
y: 0.10992102523245781 ,
color:"#C70039",
},{
name: ' Cinderella ' ,
y: 0.10851702629705251 ,
color:"#C70039",
},{
name: ' Wonder ' ,
y: 0.06121391899642231 ,
color:"#C70039",
},{
name: ' Wonder ' ,
y: 0.06094720207717018 ,
color:"#C70039",
},{
name: ' Little Women ' ,
y: 0.043338915265064594 ,
color:"#C70039",
},{
name: ' Hugo ' ,
y: 0.036025077981249466 ,
color:"#C70039",
},{
name: ' Contact ' ,
y: 0.03423881834270405 ,
color:"#C70039",
},{
name: ' Resurrection ' ,
y: 0.031473064673483604 ,
color:"#C70039",
},{
name: ' Phenomenon ' ,
y: 0.03042038310878855 ,
color:"#C70039",
},{
name: ' Bridge to Terabithia ' ,
y: 0.027529273427924796 ,
color:"#C70039",
},{
name: ' Sense and Sensibility ' ,
y: 0.0269281584279092 ,
color:"#C70039",
},{
name: ' Forever Young ' ,
y: 0.02560226914729842 ,
color:"#C70039",
},{
name: ' Rocky III ' ,
y: 0.025021318835561402 ,
color:"#C70039",
},{
name: ' On Golden Pond ' ,
y: 0.023867369638086482 ,
color:"#C70039",
},{
name: ' The Lake House ' ,
y: 0.022975921543411725 ,
color:"#C70039",
},{
name: ' Mr. Holland\'s Opus ' ,
y: 0.021263155570788162 ,
color:"#C70039",
},{
name: ' Dolphin Tale ' ,
y: 0.019222026117505144 ,
color:"#C70039",
},{
name: ' The Last Song ' ,
y: 0.018543779884263614 ,
color:"#C70039",
},{
name: ' The Last Song ' ,
y: 0.01783509453584228 ,
color:"#C70039",
},{
name: ' Footloose ' ,
y: 0.01600868559832901 ,
color:"#C70039",
},{
name: ' War Room ' ,
y: 0.014803758436182365 ,
color:"#C70039",
},{
name: ' War Room ' ,
y: 0.01480142543096974 ,
color:"#C70039",
},{
name: ' Staying Alive ' ,
y: 0.012984128419475586 ,
color:"#C70039",
},{
name: ' God\'s Not Dead ' ,
y: 0.012939149039390006 ,
color:"#C70039",
},{
name: ' August Rush ' ,
y: 0.012926721684865472 ,
color:"#C70039",
},{
name: ' The Remains of the Day ' ,
y: 0.012796507580034979 ,
color:"#C70039",
},{
name: ' The Natural ' ,
y: 0.009604138096565115 ,
color:"#C70039",
},{
name: ' A Walk to Remember ' ,
y: 0.009503078553444008 ,
color:"#C70039",
},{
name: ' Urban Cowboy ' ,
y: 0.00938770284197271 ,
color:"#C70039",
},{
name: ' We Are Marshall ' ,
y: 0.00871282606051339 ,
color:"#C70039",
},{
name: ' A River Runs Through It ' ,
y: 0.008691803002392428 ,
color:"#C70039",
},{
name: ' Absence of Malice ' ,
y: 0.00814690222317329 ,
color:"#C70039",
},{
name: ' Dreamer ' ,
y: 0.007751685937913325 ,
color:"#C70039",
},{
name: ' Overcomer ' ,
y: 0.0076238828719174916 ,
color:"#C70039",
},{
name: ' The Majestic ' ,
y: 0.007464482191583959 ,
color:"#C70039",
},{
name: ' Taps ' ,
y: 0.007174301162444658 ,
color:"#C70039",
},{
name: ' The Indian in the Cupboard ' ,
y: 0.007134299927272464 ,
color:"#C70039",
},{
name: ' Fireproof ' ,
y: 0.006697545744744855 ,
color:"#C70039",
},{
name: ' The Age of Innocence ' ,
y: 0.006453868752613964 ,
color:"#C70039",
},{
name: ' The Jazz Singer ' ,
y: 0.0054259378521386 ,
color:"#C70039",
},{
name: ' Tuck Everlasting ' ,
y: 0.0038705917063661553 ,
color:"#C70039",
},{
name: ' Akeelah and the Bee ' ,
y: 0.003791317549827424 ,
color:"#C70039",
},{
name: ' Honeysuckle Rose ' ,
y: 0.0035645778554022458 ,
color:"#C70039",
},{
name: ' Extraordinary Measures ' ,
y: 0.0031667604494077946 ,
color:"#C70039",
},{
name: ' Pure Country ' ,
y: 0.0030341993333299544 ,
color:"#C70039",
},{
name: ' The Night the Lights Went Out in Georgia ' ,
y: 0.002986036581637784 ,
color:"#C70039",
},{
name: ' Ragtime ' ,
y: 0.002985442325593059 ,
color:"#C70039",
},{
name: ' Music of the Heart ' ,
y: 0.002973159033139973 ,
color:"#C70039",
},{
name: ' The Spanish Prisoner ' ,
y: 0.002768218731331894 ,
color:"#C70039",
},{
name: ' The Lunchbox ' ,
y: 0.002447354481836171 ,
color:"#C70039",
},{
name: ' Gettysburg ' ,
y: 0.002154920481968384 ,
color:"#C70039",
},{
name: ' Somewhere in Time ' ,
y: 0.0019427570679668466 ,
color:"#C70039",
},{
name: ' What If... ' ,
y: 0.0017059930544033786 ,
color:"#C70039",
},{
name: ' Tender Mercies ' ,
y: 0.001689351884288976 ,
color:"#C70039",
},{
name: ' Three Wishes ' ,
y: 0.0014057056707795462 ,
color:"#C70039",
},{
name: ' Six Weeks ' ,
y: 0.0013341788523053774 ,
color:"#C70039",
},{
name: ' The Secret of Roan Inish ' ,
y: 0.0012208900400079781 ,
color:"#C70039",
},{
name: ' Eddie and the Cruisers ' ,
y: 0.0009577706708178526 ,
color:"#C70039",
},{
name: ' Fluke ' ,
y: 0.0007978977870279055 ,
color:"#C70039",
},{
name: ' The Ultimate Gift ' ,
y: 0.0006880444549621317 ,
color:"#C70039",
},{
name: ' Looker ' ,
y: 0.000656528875970674 ,
color:"#C70039",
},{
name: ' Newsies ' ,
y: 0.000564139068343821 ,
color:"#C70039",
},{
name: ' Table for Five ' ,
y: 0.0004802069048282557 ,
color:"#C70039",
},{
name: ' Testament ' ,
y: 0.0004091542906726049 ,
color:"#C70039",
},{
name: ' Man, Woman and Child ' ,
y: 0.0003413290670898207 ,
color:"#C70039",
},{
name: ' Cattle Annie and Little Britches ' ,
y: 0.00010701010701676988 ,
color:"#C70039",
},{
name: ' Five Days One Summer ' ,
y: 3.983316275550381e-05 ,
color:"#C70039",
}
,{
name: ' Hell ' ,
y: 0.00520639197113044 ,
color:"#FF5733",
},{
name: ' Crash ' ,
y: 0.09741626549901868 ,
color:"#FF5733",
},{
name: ' Crash ' ,
y: 0.09475588094154686 ,
color:"#FF5733",
},{
name: ' Lust, Caution ' ,
y: 0.06460065143400773 ,
color:"#FF5733",
},{
name: ' Se, jie ' ,
y: 0.06274762192347601 ,
color:"#FF5733",
},{
name: ' Lust, Caution ' ,
y: 0.06274762192347601 ,
color:"#FF5733",
},{
name: ' Natural Born Killers ' ,
y: 0.0484164223116735 ,
color:"#FF5733",
},{
name: ' Showgirls ' ,
y: 0.036348982740728945 ,
color:"#FF5733",
},{
name: ' Last Tango in Paris ' ,
y: 0.034805467094213366 ,
color:"#FF5733",
},{
name: ' Shame ' ,
y: 0.019654866958915027 ,
color:"#FF5733",
},{
name: ' Shame ' ,
y: 0.019654866958915027 ,
color:"#FF5733",
},{
name: ' Shame ' ,
y: 0.019654866958915027 ,
color:"#FF5733",
},{
name: ' Shame ' ,
y: 0.019654866958915027 ,
color:"#FF5733",
},{
name: ' Kids ' ,
y: 0.019654269980860305 ,
color:"#FF5733",
},{
name: ' Kids ' ,
y: 0.019654269980860305 ,
color:"#FF5733",
},{
name: ' Showgirls ' ,
y: 0.01959508249533823 ,
color:"#FF5733",
},{
name: ' Blue Is the Warmest Colour ' ,
y: 0.018743031123720486 ,
color:"#FF5733",
},{
name: ' Blue Is the Warmest Colour ' ,
y: 0.018743031123720486 ,
color:"#FF5733",
},{
name: ' Matador ' ,
y: 0.016711794035176298 ,
color:"#FF5733",
},{
name: ' Blue Valentine ' ,
y: 0.015951099563287444 ,
color:"#FF5733",
},{
name: ' The Dreamers ' ,
y: 0.01473872379225418 ,
color:"#FF5733",
},{
name: ' The Dreamers ' ,
y: 0.014559678519229444 ,
color:"#FF5733",
},{
name: ' Beyond the Valley of the Dolls ' ,
y: 0.00866581047175382 ,
color:"#FF5733",
},{
name: ' Happiness 1998 ' ,
y: 0.005533071842823304 ,
color:"#FF5733",
},{
name: ' Killer Joe ' ,
y: 0.0044861071363392156 ,
color:"#FF5733",
},{
name: ' Clerks ' ,
y: 0.003749638419058066 ,
color:"#FF5733",
},{
name: ' Elles ' ,
y: 0.0036803119352840355 ,
color:"#FF5733",
},{
name: ' Arabian Nights ' ,
y: 0.003325187022151564 ,
color:"#FF5733",
},{
name: ' Frontier(s) ' ,
y: 0.0026801811200606253 ,
color:"#FF5733",
},{
name: ' The Evil Dead ' ,
y: 0.0025630963919089293 ,
color:"#FF5733",
},{
name: ' Young Adam ' ,
y: 0.002466694064749819 ,
color:"#FF5733",
},{
name: ' Two Girls and a Guy ' ,
y: 0.002229067912936027 ,
color:"#FF5733",
},{
name: ' Nymphomaniac: Vol. I ' ,
y: 0.002016534096777114 ,
color:"#FF5733",
},{
name: ' Bad Lieutenant ' ,
y: 0.001963210476340922 ,
color:"#FF5733",
},{
name: ' A Dirty Shame ' ,
y: 0.0018430927145241121 ,
color:"#FF5733",
},{
name: ' Wide Sargasso Sea ' ,
y: 0.0015548197148420703 ,
color:"#FF5733",
},{
name: ' Law of Desire ' ,
y: 0.001416195633328915 ,
color:"#FF5733",
},{
name: ' Queen of Hearts ' ,
y: 0.0011909134470982216 ,
color:"#FF5733",
},{
name: ' Ma mère ' ,
y: 0.0009841953526336853 ,
color:"#FF5733",
},{
name: ' Ma Mère ' ,
y: 0.0009841953526336853 ,
color:"#FF5733",
},{
name: ' Whore ' ,
y: 0.0009709559199685058 ,
color:"#FF5733",
},{
name: ' Whore 1991 ' ,
y: 0.0009709559199685058 ,
color:"#FF5733",
},{
name: ' The Big Feast ' ,
y: 0.0006652164978467291 ,
color:"#FF5733",
},{
name: ' Orgazmo ' ,
y: 0.0006039973612029393 ,
color:"#FF5733",
},{
name: ' Bent ' ,
y: 0.00047764021584646666 ,
color:"#FF5733",
},{
name: ' Pink Flamingos ' ,
y: 0.00039843470813463673 ,
color:"#FF5733",
},{
name: ' Tokyo Decadence ' ,
y: 0.0002675231979413424 ,
color:"#FF5733",
},{
name: ' Man Bites Dog ' ,
y: 0.0001979367398531592 ,
color:"#FF5733",
},{
name: ' Chained ' ,
y: 0.006204461478904e-05 ,
color:"#FF5733",
}
,{
name: ' The Lion King 1994 ' ,
y: 0.26147885261744513 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Beauty and the Beast 1991 ' ,
y: 0.11630273554483538 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Hunchback of Notre Drame ' ,
y: 0.0863010375487224 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Secret Garden ' ,
y: 0.08253110067343736 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Sound of Music ' ,
y: 0.07588504584079123 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Bambi 1942 ' ,
y: 0.07105584658389433 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Babe ' ,
y: 0.06524941732946415 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Charlotte\'s Web ' ,
y: 0.03817547208967575 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Tale of Despereaux ' ,
y: 0.02398991734505535 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Rookie ' ,
y: 0.021394581337878135 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Rookie ' ,
y: 0.021341019016509186 ,
sliced:true,
color:"#FFAA00",
},{
name: ' My Fair Lady 1964 ' ,
y: 0.019108624607797248 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Babe: Pig in the City ' ,
y: 0.018329189694847987 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Little Rascals ' ,
y: 0.017750161433978465 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Ten Commandments 1966 ' ,
y: 0.017366261012108503 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Hachiko: A Dog\'s Story ' ,
y: 0.012648847449378404 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Black Stallion ' ,
y: 0.010021960525247894 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Giant ' ,
y: 0.008005557330788077 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Ramona and Beezus ' ,
y: 0.007283123524021923 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Prancer ' ,
y: 0.004928078239825991 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Kit Kittredge: An American Girl ' ,
y: 0.004681723907847047 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Three Cions in the Fountain ' ,
y: 0.0031816050709206414 ,
sliced:true,
color:"#FFAA00",
},{
name: ' A Little Princess ' ,
y: 0.002655433875629345 ,
sliced:true,
color:"#FFAA00",
},{
name: ' A Little Princess ' ,
y: 0.002655433875629345 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Secret Garden ' ,
y: 0.002312295117392995 ,
sliced:true,
color:"#FFAA00",
},{
name: ' The Quiet Man ' ,
y: 0.002015117295743652 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Lassie Come Home ' ,
y: 0.0011976091754457114 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Pollyanna ' ,
y: 0.0009942515846627004 ,
sliced:true,
color:"#FFAA00",
},{
name: ' A Sunday in the Country ' ,
y: 0.0006392746042249663 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Little Dorrit ' ,
y: 0.0002718230805716641 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Miracle of Marcelino ' ,
y: 0.0001571871985288343 ,
sliced:true,
color:"#FFAA00",
},{
name: ' La traviata ' ,
y: 5.183099794285635e-05 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Before the Wrath ' ,
y: 2.8899579394195828e-05 ,
sliced:true,
color:"#FFAA00",
},{
name: ' Through the Olive Trees ' ,
y: 1.0684890363175155e-05 ,
sliced:true,
color:"#FFAA00",}],
innerSize:'50%',
size:'57%'
}]
});
%%js
function dollarFormat(x) {
return '$' + Highcharts.numberFormat(x, 0, '.', ',');
}
var colors = Highcharts.getOptions().colors;
Highcharts.chart('container9', {
chart: {
type: 'column',
inverted: false,
height: 450,
width: 1100,
},
accessibility: {
series: {
descriptionFormatter: function (series) {
return series.type === 'line' ?
series.name + ', ' + dollarFormat(series.points[0].y) :
series.name + ' grant amounts, bar series with ' +
series.points.length + ' bars.';
}
},
point: {
valuePrefix: '$'
},
keyboardNavigation: {
seriesNavigation: {
mode: 'serialize'
}
}
},
title: {
text: 'Total Net Profit of each System Rating in the Drama Genere',
margin: 35
},
subtitle: {
text: 'There are five System Ratings: R-rated| G-rated| PG-rated| PG-13 rated| NC-17 rated '
},
xAxis: {
visible: false,
accessibility: {
description: 'Grant applicants',
rangeDescription: ''
}
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
yAxis: [{
min: 0,
max: 900000000,
step: 250000000,
labels: {
format: '${text}'
},
title: {
text: 'Movies Profit'
},
gridLineWidth: 1
}, {
accessibility: {
description: 'System Ratigs Category Totals'
},
opposite: true,
min: 0,
max: 7000000000,
step: 1000000000,
gridLineWidth: 0,
labels: {
format: '${text}',
style: {
color: '#8F6666'
}
},
title: {
text: 'System Ratigs Category Total',
style: {
color: '#8F6666'
}
}
}],
credits: {
enabled: false
},
plotOptions: {
column: {
keys: ['name', 'y'],
grouping: false,
pointPadding: 0.1,
groupPadding: 0,
tooltip: {
headerFormat: '<span style="font-size: 10px">' +
'<span style="color:{point.color}">\u25CF</span> ' +
'{series.name}</span><br/>',
pointFormat: '{point.name}: <b>${point.y:,.0f}</b><br/>'
}
},
line: {
yAxis: 1,
lineWidth: 5,
accessibility: {
exposeAsGroupOnly: true
},
marker: {
enabled: false
},
enableMouseTracking: false,
linkedTo: ':previous',
dataLabels: {
enabled: true,
verticalAlign: 'bottom',
style: {
color: '#757575',
fontWeight: 'normal'
},
formatter: function () {
if (this.point === this.series.points[Math.floor(
this.series.points.length / 2
)]) {
return 'Total: $' + Highcharts.numberFormat(this.y, 0);
}
}
}
}
},
responsive: {
rules: [{
condition: {
maxWidth: 400
},
chartOptions: {
chart: {
spacingLeft: 3,
spacingRight: 5
},
yAxis: [{}, {
visible: false
}]
}
}]
},
series: [{
name: 'System Rating R',
color: '#ff0000',
borderColor: '#A59273',
borderWidth: 1,
data: [
[ ' Django Unchained ' , 349948323 ],
[ ' Gone Girl ' , 307567189 ],
[ ' Priest ' , 24154026 ],
[ ' Fifty Shades Darker ' , 326398492 ],
[ ' Fifty Shades Freed ' , 316350619 ],
[ ' Crimson Peak ' , 19966854 ],
[ ' Zero Dark Thirty ' , 82112435 ],
[ ' The Master ' , 13147416 ],
[ ' Flight ' , 129558438 ],
[ ' The Ides of March ' , 54735925 ],
[ ' Nocturnal Animals ' , 9898681 ],
[ ' The Water Diviner ' , 8554727 ],
[ ' For Colored Girls ' , 17017873 ],
[ ' The Debt ' , 26604054 ],
[ ' Let Me In ' , 8270399 ],
[ ' Black Swan ' , 318266710 ],
[ ' Ex Machina ' , 25358392 ],
[ ' Room ' , 23262783 ],
[ ' If Beale Street Could Talk ' , 7859167 ],
[ ' Arbitrage ' , 23830713 ],
[ ' Stoker ' , 34913 ],
[ ' Carol ' , 31043521 ],
[ ' Quartet ' , 45178935 ],
[ ' Hereditary ' , 60133905 ],
[ ' Melancholia ' , 12417298 ],
[ ' Manchester by the Sea ' , 69233867 ],
[ ' We Need to Talk About Kevin ' , 3765283 ],
[ ' Addicted ' , 12499242 ],
[ ' Mommy ' , 12636004 ],
[ ' Take Shelter ' , 222016 ],
[ ' Boyhood ' , 53273049 ],
[ ' The Witch ' , 36954520 ],
[ ' Margin Call ' , 17033227 ],
[ ' Whiplash ' , 35669037 ],
[ ' Before Midnight ' , 20251930 ],
[ ' Silent House ' , 14610760 ],
[ ' Winter\'s Bone ' , 14131551 ],
[ ' The Florida Project ' , 9295324 ],
[ ' We Are Your Friends ' , 8153415 ],
[ ' Locke ' , 88390 ],
[ ' Knock Knock ' , 4328516 ],
[ ' Buried ' , 19282640 ],
[ ' Unsane ' , 12744931 ],
[ ' Blue Valentine ' , 15566240 ],
[ ' Martha Marcy May Marlene ' , 4438911 ],
[ ' Palo Alto ' , 156309 ],
[ ' Sound of My Voice ' , 294448 ],
[ ' A Ghost Story ' , 2669782 ],
[ ' Ordinary People ' , 48766923 ],
[ ' Fame ' , 68711836 ],
[ ' Endless Love ' , 14718173 ],
[ ' Ghost Story ' , 1851683 ],
[ ' Zoot Suit ' , 556082 ],
[ ' Rich and Famous ' , 1500000 ],
[ ' Raggedy Man ' , 2000000 ],
]
}, {
type: 'line',
name: 'System Rating R',
data: [
3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978,
3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978,
3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978,
3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978,
3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978, 3278073978,
3278073978, 3278073978, 3278073978
],
color: '#ff1919'
}, {
name: 'System Rating NC-17',
color: '#d61111',
data: [
[ ' Shame ' , 13912841 ],
[ ' Matador ' , 4856268 ],
[ ' Whore ' , 8404 ],
[ ' Tokyo Decadence ' , 257845 ],
[ ' Wide Sargasso Sea ' , 659312 ],
[ ' Kids ' , 18912216 ],
[ ' Crash ' , 89410061 ],
[ ' The Dreamers ' , 121165 ],
[ ' Lust, Caution ' , 52091915 ],
[ ' Shame ' , 13912841 ],
[ ' Blue Is the Warmest Colour ' , 15465835 ],
[ ' The Dreamers ' , 307113 ],
[ ' Shame ' , 13912841 ],
[ ' Blue Is the Warmest Colour ' , 15390895 ],
[ ' Blue Valentine ' , 15566240 ],
[ ' Two Girls and a Guy ' , 1315026 ],
[ ' Elles ' , 256669 ],
[ ' Se, jie ' , 50167430 ],
[ ' The Evil Dead ' , 2311944 ],
[ ' Shame ' , 13912841 ],
[ ' Arabian Nights ' , 2548651 ],
[ ' Natural Born Killers ' , 16283563 ],
[ ' Clerks ' , 3664240 ],
[ ' Bad Lieutenant ' , 1038916 ],
[ ' Beyond the Valley of the Dolls ' , 8000000 ],
[ ' Kids ' , 18912216 ],
[ ' Crash ' , 94673038 ],
[ ' Last Tango in Paris ' , 34897711 ],
[ ' Pink Flamingos ' , 401802 ],
[ ' Lust, Caution ' , 50167430 ],
[ ' Happiness 1998 ' , 3546453 ],
[ ' Whore 1991 ' , 958404 ],
[ ' Law of Desire ' , 858737 ],
],
pointStart: 59
}, {
type: 'line',
name: 'System Rating NC-17',
data: [
759820867, 759820867, 759820867, 759820867, 759820867, 759820867,
759820867, 759820867, 759820867, 759820867, 759820867, 759820867,
759820867, 759820867, 759820867, 759820867, 759820867, 759820867,
759820867, 759820867, 759820867, 759820867, 759820867, 759820867,
759820867, 759820867, 759820867, 759820867, 759820867, 759820867,
759820867, 759820867, 759820867, 759820867
],
pointStart: 59,
color: '#d61111'
}, {
name: 'System Rating PG',
color: '#a10505',
data: [
[ ' Hugo ' , 47784 ],
[ ' Dolphin Tale ' , 59068724 ],
[ ' Wonder ' , 284604712 ],
[ ' The Last Song ' , 72678948 ],
[ ' War Room ' , 70975239 ],
[ ' The Lunchbox ' , 10531500 ],
[ ' Somewhere in Time ' , 4609597 ],
[ ' Urban Cowboy ' , 36918287 ],
[ ' Cinderella ' , 447351353 ],
[ ' War Room ' , 70986904 ],
[ ' Wonder ' , 285937718 ],
[ ' Little Women ' , 176601214 ],
[ ' Overcomer ' , 33102988 ],
[ ' The Jazz Singer ' , 26696000 ],
[ ' A Walk to Remember ' , 35694916 ],
[ ' Tuck Everlasting ' , 4344615 ],
[ ' Dreamer ' , 6741732 ],
[ ' The Lake House ' , 74830111 ],
[ ' Akeelah and the Bee ' , 10948425 ],
[ ' Bridge to Terabithia ' , 120587063 ],
[ ' August Rush ' , 34605762 ],
[ ' Fireproof ' , 32973297 ],
[ ' The Last Song ' , 69137047 ],
[ ' God\'s Not Dead ' , 62667874 ],
[ ' Mr. Holland\'s Opus ' , 83269971 ],
[ ' Phenomenon ' , 120036382 ],
[ ' Contact ' , 81120329 ],
[ ' The Spanish Prisoner ' , 3835130 ],
[ ' Sense and Sensibility ' , 118582776 ],
[ ' The Secret of Roan Inish ' , 3101815 ],
[ ' The Remains of the Day ' , 48954968 ],
[ ' Pure Country ' , 5164458 ],
[ ' Forever Young ' , 107956187 ],
[ ' A River Runs Through It ' , 31440294 ],
[ ' Honeysuckle Rose ' , 12815212 ],
[ ' Resurrection ' , 150297525 ],
[ ' Taps ' , 21856053 ],
[ ' On Golden Pond ' , 104285432 ],
[ ' Absence of Malice ' , 28716963 ],
[ ' The Night the Lights Went Out in Georgia ' , 7423752 ],
[ ' Rocky III ' , 108052686 ],
[ ' Tex ' , 544368315 ],
[ ' Staying Alive ' , 42892670 ],
[ ' Tender Mercies ' , 3943124 ],
[ ' Footloose ' , 71808942 ],
[ ' The Natural ' , 20000000 ],
],
pointStart: 96
}, {
type: 'line',
name: 'System Rating PG',
data: [
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794, 3752564794, 3752564794,
3752564794, 3752564794, 3752564794, 3752564794,
],
pointStart: 96,
color: '#a10505'
}, {
name: 'System Rating PG\-13',
color: '#7a2f2f',
data: [
[ ' Gravity ' , 583698673 ],
[ ' Sing ' , 559454789 ],
[ ' Contagion ' , 77551594 ],
[ ' Burlesque ' , 35552675 ],
[ ' Creed II ' , 163591522 ],
[ ' The Post ' , 129748880 ],
[ ' Hereafter ' , 58660270 ],
[ ' Anna Karenina ' , 22004627 ],
[ ' Arrival ' , 156127894 ],
[ ' Charlie St. Cloud ' , 4478084 ],
[ ' Bridge of Spies ' , 122498338 ],
[ ' The Impossible ' , 129590606 ],
[ ' Water for Elephants ' , 78809717 ],
[ ' Creed ' , 136567581 ],
[ ' The Rite ' , 60143987 ],
[ ' Collateral Beauty ' , 49309093 ],
[ ' True Grit ' , 217276928 ],
[ ' The Tree of Life ' , 26721826 ],
[ ' The Longest Ride ' , 29802928 ],
[ ' Step Up Revolution ' , 132552290 ],
[ ' The Vow ' , 167618160 ],
[ ' The Age of Adaline ' , 38984536 ],
[ ' Safe Haven ' , 66050951 ],
[ ' The Best of Me ' , 15059418 ],
[ ' The Help ' , 188120004 ],
[ ' Dear John ' , 117033509 ],
[ ' The Lucky One ' , 71633833 ],
[ ' The Giver ' , 41540205 ],
[ ' Draft Day ' , 4847480 ],
[ ' Rings ' , 57917283 ],
[ ' Fences ' , 40282881 ],
[ ' Me Before You ' , 188265198 ],
[ ' The Light Between Oceans ' , 2281732 ],
[ ' The Book Thief ' , 57086711 ],
[ ' A Quiet Place ' , 317522294 ],
[ ' Beastly ' , 21028230 ],
[ ' The Roommate ' , 36545707 ],
[ ' Remember Me ' , 40506120 ],
[ ' The Woman in Black ' , 113955898 ],
[ ' Country Strong ' , 5601987 ],
[ ' One Day ' , 44168692 ],
[ ' Suffragette ' , 20044909 ],
[ ' The Perks of Being a Wallflower ' , 20069303 ],
[ ' Project Almanac ' , 20909437 ],
[ ' Wish Upon ' , 11477345 ],
[ ' If I Stay ' , 67356170 ],
[ ' Brooklyn ' , 51076141 ],
[ ' Everything, Everything ' , 51603136 ],
[ ' Mud ' , 21556959 ],
[ ' Amour ' , 27087044 ],
[ ' Ouija: Origin of Evil ' , 72831866 ],
[ ' Black or White ' , 12971021 ],
[ ' The Bye Bye Man ' , 23787727 ],
[ ' Gifted ' , 29964656 ],
[ ' The Words ' , 10369708 ],
[ ' Lights Out ' , 143806510 ],
[ ' Still Alice ' , 36699612 ],
[ ' Before I Fall ' , 13945682 ],
[ ' Rabbit Hole ' , 1205034 ],
[ ' Ida ' , 12698355 ],
[ ' Courageous ' , 33185884 ],
[ ' Mustang ' , 4152584 ],
[ ' Like Crazy ' , 3478400 ],
[ ' Another Earth ' , 1927779 ]
],
pointStart: 150
}, {
type: 'line',
name: 'System Rating PG\-13',
data: [
5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,
5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,
5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,
5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,
5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393, 5102398393,
5102398393, 5102398393, 5102398393, 5102398393,
],
pointStart: 150,
color: '#7a2f2f',
},{
name: 'System Rating G',
color: '#4d0909',
borderWidth: 1,
data: [
[ ' A Sunday in the Country ' , 1711143 ],
[ ' Prancer ' , 11587135 ],
[ ' The Rookie ' , 58693537 ],
[ ' Beauty and the Beast 1991 ' , 418656843 ],
[ ' The Little Rascals ' , 43947950 ],
[ ' Ramona and Beezus ' , 12469621 ],
[ ' The Black Stallion ' , 35099643 ],
[ ' The Hunchback of Notre Drame ' , 255500000 ],
[ ' Babe ' , 216100000 ],
[ ' Pollyanna ' , 1250000 ],
[ ' Lassie Come Home ' , 3851000 ],
[ ' Charlotte\'s Web ' , 58985708 ],
[ ' Kit Kittredge: An American Girl ' , 7657973 ],
[ ' The Rookie ' , 58491516 ],
[ ' The Secret Garden ' , 293281000 ],
[ ' The Sound of Music ' , 278014195 ],
[ ' The Tale of Despereaux ' , 30482317 ],
[ ' Bambi 1942 ' , 267142000 ],
[ ' My Fair Lady 1964 ' , 55071636 ],
[ ' Hachiko: A Dog\'s Story ' , 37707417 ],
[ ' Giant ' , 23794409 ],
[ ' The Ten Commandments 1966 ' , 52500000 ],
[ ' The Quiet Man ' , 5850377 ],
[ ' Three Cions in the Fountain ' , 10300000 ],
],
pointStart:216
}, {
type: 'line',
name: 'System Rating G',
data: [
3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288,
3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288,
3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288,
3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288, 3179360288
],
pointStart: 216,
color: '#4d0909'
}]
});
%%js
Highcharts.chart('x',{
chart: {
width: 900,
height: 350
},
title:{
text:"What Movie Is The Most Successful1?"
},
xAxis:{
categories:['The Lion King 1994 | 1st Highest', 'Gravity | 2nd Highest', 'Sing | 3rd Highest', 'Tex | 4th Highest',
'Fifty Shades of Grey | 5th Highest', 'Cinderella | 6th Highest', 'Beauty and the Beast 1991 | 7th Highest',
'Django Unchained | 8th Highest', 'Fifty Shades Darker | 9th Highest', 'Black Swan | 10th Highest'],
crosshair:{
enabled:true
},
labels:{
enabled:false
}
},
yAxis:{
min:0,
max:1000000000,
step:250000000,
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
plotOptions:{
series:{
marker:{
states:{
hover:{
radiusPlus:12,
lineWidthPlus:5
}
}
}
}
},
tooltip:{
shared:false
},
states:{
hover:{
lineWidthPlus:10
}
},
series:[{
type:'column',
color:'#C21602',
name:'Profit',
data:[941214868.0, 583698673.0, 559454789.0, 544368315.0, 530998101.0, 447351353.0,
418656843.0, 349948323.0, 326398492.0, 318266710.0]
},{
type:'column',
color:'#F88379',
name:'Revenue',
data:[986214868, 693698673, 634454789, 549368315, 570998101, 542351353,
438656843, 449948323, 381398492, 331266710]
},{
type:'spline',
color:'gold',
name:'Cost',
data:[45000000.0, 110000000.0, 75000000.0, 5000000.0, 40000000.0,
95000000.0, 20000000.0, 100000000.0, 55000000.0, 13000000.0],
marker:{
lineWidth: 2,
lineColor: 'gold',
fillColor: 'white',
raduis:2
}
}]
});
%%js
(function (H) {
H.addEvent(H.Axis, 'afterInit', function () {
const logarithmic = this.logarithmic;
if (logarithmic && this.options.custom.allowNegativeLog) {
// Avoid errors on negative numbers on a log axis
this.positiveValuesOnly = false;
// Override the converter functions
logarithmic.log2lin = num => {
const isNegative = num < 0;
let adjustedNum = Math.abs(num);
if (adjustedNum < 10) {
adjustedNum += (10 - adjustedNum) / 10;
}
const result = Math.log(adjustedNum) / Math.LN10;
return isNegative ? -result : result;
};
logarithmic.lin2log = num => {
const isNegative = num < 0;
let result = Math.pow(10, Math.abs(num));
if (result < 10) {
result = (10 * (result - 1)) / (10 - 1);
}
return isNegative ? -result : result;
};
}
});
}(Highcharts));
Highcharts.chart('n',{
chart: {
width: 900,
height: 300
},
title:{
text:""
},
xAxis:{
categories:['The Lion King 1994 | 1st Highest', 'Gravity | 2nd Highest', 'Sing | 3rd Highest', 'Tex | 4th Highest',
'Fifty Shades of Grey | 5th Highest', 'Cinderella | 6th Highest', 'Beauty and the Beast 1991 | 7th Highest',
'Django Unchained | 8th Highest', 'Fifty Shades Darker | 9th Highest', 'Black Swan | 10th Highest'],
crosshair:{
enabled:true
},
labels:{
enabled:true
}
},
yAxis: {
type: 'logarithmic',
custom: {
allowNegativeLog: true
},
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
plotOptions: {
bar: {
dataLabels: {
enabled: true,
valueSuffix:'%',
}
},
series: {
dataLabels: {
enabled: true,
valueSuffix:'%',
style: {
textOutline: false ,
fontWeight: 'bold'
}
}
}
},
tooltip:{
valueSuffix:'%',
shared:true
},
series:[{
type:'column',
color:'#F57070',
name:'Net Profit Margin',
data:[95, 84, 88, 99, 93, 82, 95, 78, 86, 96]
},{
type:'column',
color:'#EC0303',
name:'Return On Investment Percentage',
data:[2092, 531, 746, 10887, 1327, 471, 2093, 350, 593, 2448]
}]
});
%%js
Highcharts.chart('no',{
chart: {
width: 900,
height: 350
},
title:{
text:"What Movie Is The Most Successful?"
},
xAxis:{
categories:['A Quiet Place | 11th Highest', 'Fifty Shades Freed | 12th Highest', 'Gone Girl | 13th Highest', 'The Secret Garden | 14th Highest',
'Wonder | 15th Highest', 'Wonder | 16th Highest', 'The Sound of Music | 17th Highest', 'Bambi 1942 | 18th Highest',
'The Hunchback of Notre Drame | 19th Highest', 'True Grit | 20th Highest'],
crosshair:{
enabled:true
},
labels:{
enabled:false
}
},
yAxis:{
min:0,
max:400000000,
step:250000000,
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
plotOptions:{
series:{
marker:{
states:{
hover:{
radiusPlus:12,
lineWidthPlus:5
}
}
}
}
},
tooltip:{
shared:false
},
states:{
hover:{
lineWidthPlus:10
}
},
series:[{
type:'column',
color:'#C21602',
name:'Profit',
data:[317522294.0, 316350619.0, 307567189.0, 293281000.0, 285937718.0, 284604712.0, 278014195.0, 267142000.0, 255500000.0, 217276928.0]
},{
type:'column',
color:'#F88379',
name:'Revenue',
data:[334522294, 371350619, 368567189, 311281000, 305937718, 304604712, 286214195, 268000000, 325500000, 252276928]
},{
type:'spline',
color:'gold',
name:'Cost',
data:[17000000.0, 55000000.0, 61000000.0, 18000000.0, 20000000.0, 20000000.0, 8200000.0, 858000.0, 70000000.0, 35000000.0],
marker:{
lineWidth: 2,
lineColor: 'gold',
fillColor: 'white',
raduis:2
}
}]
});
%%js
(function (H) {
H.addEvent(H.Axis, 'afterInit', function () {
const logarithmic = this.logarithmic;
if (logarithmic && this.options.custom.allowNegativeLog) {
// Avoid errors on negative numbers on a log axis
this.positiveValuesOnly = false;
// Override the converter functions
logarithmic.log2lin = num => {
const isNegative = num < 0;
let adjustedNum = Math.abs(num);
if (adjustedNum < 10) {
adjustedNum += (10 - adjustedNum) / 10;
}
const result = Math.log(adjustedNum) / Math.LN10;
return isNegative ? -result : result;
};
logarithmic.lin2log = num => {
const isNegative = num < 0;
let result = Math.pow(10, Math.abs(num));
if (result < 10) {
result = (10 * (result - 1)) / (10 - 1);
}
return isNegative ? -result : result;
};
}
});
}(Highcharts));
Highcharts.chart('xo',{
chart: {
width: 900,
height: 310
},
title:{
text:""
},
xAxis:{
categories:['A Quiet Place | 11th Highest', 'Fifty Shades Freed | 12th Highest', 'Gone Girl | 13th Highest', 'The Secret Garden | 14th Highest',
'Wonder | 15th Highest', 'Wonder | 16th Highest', 'The Sound of Music | 17th Highest', 'Bambi 1942 | 18th Highest',
'The Hunchback of Notre Drame | 19th Highest', 'True Grit | 20th Highest'],
crosshair:{
enabled:true
},
labels:{
enabled:true
}
},
yAxis:{
type: 'logarithmic',
},
legend: {
enabled: true,
verticalAlign: 'bottom',
symbolRadius: 20,
reversed: true
},
plotOptions: {
bar: {
dataLabels: {
enabled: true,
}
},
series: {
dataLabels: {
enabled: true,
style: {
textOutline: false ,
fontWeight: 'bold'
}
}
}
},
tooltip:{
valueSuffix:'%',
shared:true
},
series:[{
type:'column',
color:'#F57070',
name:'Net Profit Margin',
data:[95, 85, 83, 94, 93, 93, 97, 100, 78, 86]
},{
type:'column',
color:'#EC0303',
name:'Return On Investment Percentage',
data:[1868, 575, 504, 1629, 1430, 1423, 3390, 31135, 365, 621]
}]
});